上游充通大 SHANGHAI JIAO TONG UNIVERSITY RESEARCH STATUS OF HUMAN-COMPUTER INTERACTION Linghan Zheng,Rui Zhang,Chuankai Zhao An Article Submitted to Prof.Li Zhang For the Course of Academic Communication in English 2015.1.1
RESEARCH STATUS OF HUMAN-COMPUTER INTERACTION Linghan Zheng, Rui Zhang, Chuankai Zhao An Article Submitted to Prof. Li Zhang For the Course of Academic Communication in English 2015.1.1 SHANGHAI JIAO TONG UNIVERSITY
上游充通大学 Research Status of Human-Computer Interaction Research Status of Human-Computer Interaction Linghan Zhenga,Rui Zhang,Chuankai Zhao a School of Electronic,Information and Electrical Engineering,Shanghai Jiao Tong University bSchool of Mechanical Engineering.Shanghai Jiao Tong University School of Materials Science and Engineering,Shanghai Jiao Tong University Abstract With the progress of science and technology,human-computer interaction(HCI)expanded rapidly and steadily,having greatly reshaped our lifestyles.However,there is a lack of easily-understood papers to introduce these technologies to the common people.By summarizing the introductive contents in lots of papers,we show here five stages in the development of HCI:keyboard and character display,the mouse pointing device and graphical display,touch technology,multimedia and multimodal interaction,and virtual reality.Furthermore,the latest two phases,the multimodal interaction and the virtual reality,are introduced detailedly.The multimodal interaction is shown in four different aspects,including the speech recognition,the lip-reading,the facial expression recognition and the action recognition.The virtual reality includes the virtual reality system,the virtual reality hardware and the virtual reality software.At last,we predict that these two technologies will become the main directions of future HCI development.This paper provides comprehensive introduction of human-computer interaction and can be a great help to the common people and the junior researchers. Keywords Human-computer interaction;keyboard;mouse;touch technology;multimodal interaction;virtual reality. 1/19
Research Status of Human-Computer Interaction 1 / 19 Research Status of Human-Computer Interaction Linghan Zhenga , Rui Zhangb , Chuankai Zhaoc a School of Electronic, Information and Electrical Engineering, Shanghai Jiao Tong University b School of Mechanical Engineering, Shanghai Jiao Tong University c School of Materials Science and Engineering, Shanghai Jiao Tong University Abstract With the progress of science and technology, human-computer interaction (HCI) expanded rapidly and steadily, having greatly reshaped our lifestyles. However, there is a lack of easily-understood papers to introduce these technologies to the common people. By summarizing the introductive contents in lots of papers, we show here five stages in the development of HCI: keyboard and character display, the mouse pointing device and graphical display, touch technology, multimedia and multimodal interaction, and virtual reality. Furthermore, the latest two phases, the multimodal interaction and the virtual reality, are introduced detailedly. The multimodal interaction is shown in four different aspects, including the speech recognition, the lip-reading, the facial expression recognition and the action recognition. The virtual reality includes the virtual reality system, the virtual reality hardware and the virtual reality software. At last, we predict that these two technologies will become the main directions of future HCI development. This paper provides comprehensive introduction of human-computer interaction and can be a great help to the common people and the junior researchers. Keywords Human-computer interaction; keyboard; mouse; touch technology; multimodal interaction; virtual reality
上海充通大学 Research Status of Human-Computer Interaction Table of Contents1 1.Introduction-- 4 2.Keyboard and Character Display--- 3.Mouse and Graphical Display---------- --5 4.Touch Technology------ -6 5.Multimedia and Multimodal Interaction-------- -6 5.1 Speech Recognition-------- 7 5.1.1 The development of speech recognition 5.1.2 Three technologies of speech recognition 5.1.3 The application of speech recognition 5.2 Lip-reading-------- -8 5.2.1 The brief introduction of lip-reading 5.2.2 The methods applied in lip-reading technology 5.2.3 The application of lip-reading system 5.3 Facial Expression Recognition-------------------- 5.3.1 The significance of facial expression research 5.3.2 The recognition of facial expression 5.3.3 The application of facial expression technology 5.4 Action Recognition-- 11 5.4.1 The recognition of human action 5.4.2 The application of action recognition 6.Virtual Reality ------- 12 6.1 Virtual Reality System------- 12 6.1.1 Desktop Virtual Reality 6.1.2 Immersive Virtual Reality 6.1.3 Enhancement Virtual Reality 6.1.4 Distributed Virtual Reality 6.2 Virtual Reality Hardware------ 13 6.2.1 Helmet Mounted Display 6.2.2 Data Glove 6.2.3 3D Stereo Display 6.3 Virtual Reality Software-- -15 6.3.1 OpenGL 6.3.2ega 6.3.3 Webmax 6.4 The Application of Virtual Reality -16 6.4.1 Military Field 6.4.2 Industrial Field 6.4.3 Medicine Field 7.Conclusion--- -17 Part 1,2,3,4 and 7 were written by Chuankai Zhao;Part 5 was written by Linghan Zheng Part 6 was written by Rui Zhang They contributed equally to the other parts of the article. 2/19
Research Status of Human-Computer Interaction 2 / 19 Table of Contents1 1. Introduction ------------------------------------------------------------------------------------------ 4 2. Keyboard and Character Display ----------------------------------------------------------------- 4 3. Mouse and Graphical Display --------------------------------------------------------------------- 5 4. Touch Technology ----------------------------------------------------------------------------------- 6 5. Multimedia and Multimodal Interaction ----------------------------------------------------------6 5.1 Speech Recognition -------------------------------------------------------------------------- 7 5.1.1 The development of speech recognition 5.1.2 Three technologies of speech recognition 5.1.3 The application of speech recognition 5.2 Lip-reading ------------------------------------------------------------------------------------ 8 5.2.1 The brief introduction of lip-reading 5.2.2 The methods applied in lip-reading technology 5.2.3 The application of lip-reading system 5.3 Facial Expression Recognition ------------------------------------------------------------- 9 5.3.1 The significance of facial expression research 5.3.2 The recognition of facial expression 5.3.3 The application of facial expression technology 5.4 Action Recognition ------------------------------------------------------------------------- 11 5.4.1 The recognition of human action 5.4.2 The application of action recognition 6. Virtual Reality ------------------------------------------------------------------------------------- 12 6.1 Virtual Reality System --------------------------------------------------------------------- 12 6.1.1 Desktop Virtual Reality 6.1.2 Immersive Virtual Reality 6.1.3 Enhancement Virtual Reality 6.1.4 Distributed Virtual Reality 6.2 Virtual Reality Hardware ------------------------------------------------------------------ 13 6.2.1 Helmet Mounted Display 6.2.2 Data Glove 6.2.3 3D Stereo Display 6.3 Virtual Reality Software ------------------------------------------------------------------- 15 6.3.1 OpenGL 6.3.2 Vega 6.3.3 Webmax 6.4 The Application of Virtual Reality ------------------------------------------------------ 16 6.4.1 Military Field 6.4.2 Industrial Field 6.4.3 Medicine Field 7. Conclusion ------------------------------------------------------------------------------------------ 17 1 Part 1, 2, 3, 4 and 7 were written by Chuankai Zhao; Part 5 was written by Linghan Zheng; Part 6 was written by Rui Zhang. They contributed equally to the other parts of the article
上浒究通大学 Research Status of Human-Computer Interaction List of Figures Fig.1 Five stages in the development of HCI Fig.2 Schematic of the development of the keyboard Fig.3 Apple's Magic Mouse Fig.4 Multi-touch technology Fig.5 iTranslate Wice Fig.6 Some pictures from JAFFE Fig.7 Motion capture via Kinect Fig.8 Action recognition applied in somatosensory games Fig.9 Immersive Virtual Reality Fig.10 A cartoon character made by Enhancement Virtual Reality Fig.11 Distributed Virtual Reality used in military training Fig.12 Helmet Mounted Display Fig.13 Data Glove Fig.14 3D Stereo Display Fig.15 Virtual Surgery 3/19
Research Status of Human-Computer Interaction 3 / 19 List of Figures Fig. 1 Five stages in the development of HCI Fig. 2 Schematic of the development of the keyboard Fig. 3 Apple’s Magic Mouse Fig. 4 Multi-touch technology Fig. 5 iTranslate Voice Fig. 6 Some pictures from JAFFE Fig. 7 Motion capture via Kinect Fig. 8 Action recognition applied in somatosensory games Fig. 9 Immersive Virtual Reality Fig. 10 A cartoon character made by Enhancement Virtual Reality Fig. 11 Distributed Virtual Reality used in military training Fig. 12 Helmet Mounted Display Fig. 13 Data Glove Fig. 14 3D Stereo Display Fig. 15 Virtual Surgery
上游充通大学 Research Status of Human-Computer Interaction 1.Introduction Human-computer interaction (HCI)is a complex research and practice area on interaction and communication between human and computer through mutual understandings.In the early 1980s, Association for Computing Machinery first established a special interest group on HCI(Myers, 1998).Since then,HCI has attracted experts and professionals from many other disciplines and incorporated a wide range of concepts and approaches,expanding rapidly and steadily to solve problems of information managements,services and processing in maximizing degree for human needs(Carroll,2010).Over the past three decades,revolutions of HCI technologies have greatly reshaped our life.It is safe to say that human-computer technology is one of the most significant inventions of the 20th century. (a) (b) (c) (d) (e Fig.1 Five stages in the development of HCI (a)Keyboard (b)Mouse (c)Touch (d)Multimedia (e)Virtual Reality In general,as shown in Fig.1,human-computer interaction styles have experienced five phases, each of which can be represented by its milestone technology:keyboard and character display,the mouse pointing device and graphical display,touch technology,multimedia and multimodal interaction,and virtual reality (Karat et al.,2012;Kelner,2000).On the basis of relative literatures, we summarized the technologies used in different stages,and we laid our stress on the latest two phases.Firstly,we briefly introduced the first three stages,including their developmental processes and features.Next we gave a detailed description of four kinds of multimodal interaction-the recognition of speech,lip-reading,facial expression and action.Then we talked more deeply about virtual reality from its system,hardware,software and application.Finally,we discussed the trend of future development of HCI.We hope that this paper could provide some guidance to related research on HCI to be carried out in future. 2.Keyboard Character Display The early stage of HCI originated from the typewriter,based on which keyboard was then invented. The emergence of keyboard brought us into the era of characters user interface,which can be called the first generation of HCI.Keyboard and character display were the main interaction tools,and characters,texts,and commands were the main interaction contents.Thus,the interaction between human and computer in this stage seemed quite stupid and monotonous. Keyboard has been used as the input device of computer and has kept on upgrading since it was invented by Christopher Latham Sholes in the 1878(Myers,1998).According to the difference in switch,keyboards can be classified into several categories:mechanical,membrane,conductive,and capacitive keyboard(Hu,2012).From the late 19th century to the early 21st century,research on 4/19
Research Status of Human-Computer Interaction 4 / 19 1. Introduction Human-computer interaction (HCI) is a complex research and practice area on interaction and communication between human and computer through mutual understandings. In the early 1980s, Association for Computing Machinery first established a special interest group on HCI (Myers, 1998). Since then, HCI has attracted experts and professionals from many other disciplines and incorporated a wide range of concepts and approaches, expanding rapidly and steadily to solve problems of information managements, services and processing in maximizing degree for human needs (Carroll,2010). Over the past three decades, revolutions of HCI technologies have greatly reshaped our life. It is safe to say that human-computer technology is one of the most significant inventions of the 20th century. (a) (b) (c) (d) (e) Fig. 1 Five stages in the development of HCI (a) Keyboard (b) Mouse (c) Touch (d) Multimedia (e) Virtual Reality In general, as shown in Fig. 1, human-computer interaction styles have experienced five phases, each of which can be represented by its milestone technology: keyboard and character display, the mouse pointing device and graphical display, touch technology, multimedia and multimodal interaction, and virtual reality (Karat et al., 2012; Kelner, 2000). On the basis of relative literatures, we summarized the technologies used in different stages, and we laid our stress on the latest two phases. Firstly, we briefly introduced the first three stages, including their developmental processes and features. Next we gave a detailed description of four kinds of multimodal interaction - the recognition of speech, lip-reading, facial expression and action. Then we talked more deeply about virtual reality from its system, hardware, software and application. Finally, we discussed the trend of future development of HCI. We hope that this paper could provide some guidance to related research on HCI to be carried out in future. 2. Keyboard & Character Display The early stage of HCI originated from the typewriter, based on which keyboard was then invented. The emergence of keyboard brought us into the era of characters user interface, which can be called the first generation of HCI. Keyboard and character display were the main interaction tools, and characters, texts, and commands were the main interaction contents. Thus, the interaction between human and computer in this stage seemed quite stupid and monotonous. Keyboard has been used as the input device of computer and has kept on upgrading since it was invented by Christopher Latham Sholes in the 1878 (Myers, 1998). According to the difference in switch, keyboards can be classified into several categories: mechanical, membrane, conductive, and capacitive keyboard(Hu, 2012). From the late 19th century to the early 21st century, research on
上海充通大学 Research Status of Human-Computer Interaction keyboard has been a hotspot.With the fast development of multi-touch technology,virtual laser projection keyboard and virtual keyboard appeared(Longe Van Meurs,2013).Virtual keyboards were mainly used in handheld devices,such as PDA,mobile phones,and iPad etc.Nowadays, people switch their attention from going for more advanced technologies used in the mouse to pursuing its more beautiful and unique appearance,suitable size,and satisfactory portability and comfortability (a) (b) (c) Fig.2 Schematic of the development of the keyboard (a)Traditional (b)Laser projection keyboard (c)Virtual keyboard 3.Mouse Graphical Display As graphical user interfaceappeared,the mouse was invented in 1964 by another American inventor Doug Engelbart (Myers,1998).Mouse was used to control the position of cursor on the screen and gave people much convenience.Its work principle was that the pivot in the mouse was driven by the ball at the bottom of the mouse,which in turn caused the variable resistance change to generate the displacement signal.These signals were then transmitted to the host.The emergence of mouse greatly facilitated interaction between human and computer,pushing us into the second generation of HCI.In this stage,mouse and graphical display were the main interaction tools,and characters, graphs,and images became the main interaction contents.Until today,as windows operating systems are widely used,mouse has been an indispensable input device. Fig.3 Apple's Magic Mouse Gradually,as people required the mouse with higher accuracy,sensitivity,and comfortability, different mouse such as optical mouse,laser mouse,and blue mouse appeared subsequently.The most noteworthy is Apple's Magic Mouse,as shown in Fig.3.It removed mouse buttons and roller, only with a whole piece of multi-touch board that could work as common mouse mark left,right, and 360 degrees of roller function,and could let users operate more gestures function with two fingers(Rodriguez,2010).Additionally,it is worth paying attention to that at the beginning of the 21st century,there was a huge increase in patent filings of keyboard and mouse.As it is known to 5/19
Research Status of Human-Computer Interaction 5 / 19 keyboard has been a hotspot. With the fast development of multi-touch technology, virtual laser projection keyboard and virtual keyboard appeared (Longe & Van Meurs, 2013). Virtual keyboards were mainly used in handheld devices, such as PDA, mobile phones, and iPad etc. Nowadays, people switch their attention from going for more advanced technologies used in the mouse to pursuing its more beautiful and unique appearance, suitable size, and satisfactory portability and comfortability. (a) (b) (c) Fig. 2 Schematic of the development of the keyboard (a) Traditional (b) Laser projection keyboard (c) Virtual keyboard 3. Mouse & Graphical Display As graphical user interface appeared, the mouse was invented in 1964 by another American inventor Doug Engelbart (Myers, 1998). Mouse was used to control the position of cursor on the screen and gave people much convenience. Its work principle was that the pivot in the mouse was driven by the ball at the bottom of the mouse, which in turn caused the variable resistance change to generate the displacement signal. These signals were then transmitted to the host. The emergence of mouse greatly facilitated interaction between human and computer, pushing us into the second generation of HCI. In this stage, mouse and graphical display were the main interaction tools, and characters, graphs, and images became the main interaction contents. Until today, as windows operating systems are widely used, mouse has been an indispensable input device. Fig. 3 Apple’s Magic Mouse Gradually, as people required the mouse with higher accuracy, sensitivity, and comfortability, different mouse such as optical mouse, laser mouse, and blue mouse appeared subsequently. The most noteworthy is Apple’s Magic Mouse, as shown in Fig. 3. It removed mouse buttons and roller, only with a whole piece of multi-touch board that could work as common mouse mark left, right, and 360 degrees of roller function, and could let users operate more gestures function with two fingers (Rodríguez, 2010). Additionally, it is worth paying attention to that at the beginning of the 21st century, there was a huge increase in patent filings of keyboard and mouse. As it is known to
上游文通大学 Research Status of Human-Computer Interaction us,fast development ofproducts areusually accompanied with the increasing demand of the market. Apparently,the huge market demand of the keyboard and mouse was closely linked with the rapid development of the Internet. 4.Touch Technology From the PC age to the Internet age,interaction with the combination of mouse and keyboard did not change a lot.Until the appearance of smartphones and multi-touch technology,human-computer interaction stepped into the third generation,an era of fingertips,completely changing the traditional interaction styles. Development of touch screen technology was the impetus of this stage.Since the first touch screen PC-150 was created,touch screen undertook the evolution from resistive touch screen,capacitive touch screen,infrared touch screen,to the surface acoustic wave touch screen (Hotelling,Strickon, &Huppi,2010).Each technology had its own advantages and disadvantages,so the appearance of a new technology did not replace the old ones.However,another optical-touch technology,one could effectively prevent disadvantages of these technologies above,has been found,which could provide a better user experience.Recently,there was a research fever on optical-touch technology (Picciotto,Lutian,Lane,Fu,2012). Fig.4 Multi-touch technology With the development of touch screen,touch technology also progressed from single-touch technology to multi-touch technology,which triggered another huge reform of HCI.It has been widely used in large touch screen on occasions such as exhibition,entertainment and interactive meetings etc.We can predict that in the foreseeing future,multi-touch screen might replace all the old products.Keyboard and mouse will be even completely abandoned. 5.Multimedia and Multimodal Interaction Multimedia technologies that emerged in the late 1980s brought computer industry an unprecedented prosperity (Turk,2014).With the use of audio card,image card and other hardware equipment,it became possible that the computer was able to process sound,image,and video information.Thus,interaction technologies started to transfer into using sound,image and video.In this stage,other than mouse and keyboard,many other input and output devices suchas microphones, cameras,and trumpets,were gradually used for human-computer interaction.Interaction styles became more vivid and abundant than before. 6/19
Research Status of Human-Computer Interaction 6 / 19 us, fast development of products are usually accompanied with the increasing demand of the market. Apparently, the huge market demand of the keyboard and mouse was closely linked with the rapid development of the Internet. 4. Touch Technology From the PC age to the Internet age, interaction with the combination of mouse and keyboard did not change a lot. Until the appearance of smartphones and multi-touch technology, human-computer interaction stepped into the third generation, an era of fingertips, completely changing the traditional interaction styles. Development of touch screen technology was the impetus of this stage. Since the first touch screen PC-150 was created, touch screen undertook the evolution from resistive touch screen, capacitive touch screen, infrared touch screen, to the surface acoustic wave touch screen (Hotelling, Strickon, & Huppi, 2010). Each technology had its own advantages and disadvantages, so the appearance of a new technology did not replace the old ones. However, another optical-touch technology, one could effectively prevent disadvantages of these technologies above, has been found, which could provide a better user experience. Recently, there was a research fever on optical-touch technology (Picciotto, Lutian, Lane, & Fu, 2012). Fig. 4 Multi-touch technology With the development of touch screen, touch technology also progressed from single-touch technology to multi-touch technology, which triggered another huge reform of HCI. It has been widely used in large touch screen on occasions such as exhibition, entertainment and interactive meetings etc. We can predict that in the foreseeing future, multi-touch screen might replace all the old products. Keyboard and mouse will be even completely abandoned. 5. Multimedia and Multimodal Interaction Multimedia technologies that emerged in the late 1980s brought computer industry an unprecedented prosperity (Turk, 2014). With the use of audio card, image card and other hardware equipment, it became possible that the computer was able to process sound, image, and video information. Thus, interaction technologies started to transfer into using sound, image and video. In this stage, other than mouse and keyboard, many other input and output devices such as microphones, cameras, and trumpets, were gradually used for human-computer interaction. Interaction styles became more vivid and abundant than before
上浒充通大学 HANGHAI JIAO TONG UNIVERSIT Research Status of Human-Computer Interaction Especially,as many emerging subjects developed,such as cognitive psychology,artificial intelligence,image processing,multimodal interaction technology based on the multimedia became dominant.Multimodal interaction is a kind of HCI where a variety of channels are used for interaction.In this case,the computer user interface is called Multimodal User Interface.Based on the different channels of recognition,there are four methods of multimodal interaction. 5.1 Speech Recognition 5.1.1 The development of speech recognition Researches of speech recognition started in the 1950s,and in the 60's,the application of computer definitely promoted the development of speech recognition.The most important achievement in this period was a proposal of using dynamic programming,a method following the principle of optimality,to adjust the unequal problems of speech recognition.Then,in the 70's,breakthrough was made in speech recognition since the introduction of Linear Prediction Coding produced a leap in the feature-extraction.What's more,thanks to the nearly maturity of Dynamic Time Warping,a vector quantization and a Hidden Markov Model theory(HMM)were contrived,implementing a particular isolated speech recognition system based on them.And in the 1980s HMM model was successfully applied in the speech recognition(He,2002).In 21st century,this technology gradually gets a more extensive application among products on the market. 5.1.2 Three technologies of speech recognition The selection of speech recognition units is the first step of research.Word(sentence),syllable and phoneme are the three kinds of speech recognition units.Which one to use depends on the specific situation.For example,words are widely applied in small vocabulary systems rather than large ones considering that the model base is too huge.Syllables were suitable for the single syllable structure language like Chinese,while phoneme units were more common in English recognition before. After the age of 90's,however,phoneme also got a wide application in Chinese(Chen Gao,1996). The extraction ofthe characteristic parameter is the second stage.To get the serviceable information of speech signals,it is useful to realize the extraction of the characteristic parameter and gather essential messages by removing redundant ones.In this process,different information can be distinguished and at the same time large data can be compressed. The model training refers to acquiring the parameters from a large number of known models according to some certain criteria,and the pattern matching means finding the best matching patterns of an unknown model from the model base.The most common methods used in the model training and the pattern matching include Dynamic time warping technique(DTW),hidden Markov model (HMM)and artificial neural network(ANN). 5.1.3 The application of speech recognition 7119
Research Status of Human-Computer Interaction 7 / 19 Especially, as many emerging subjects developed, such as cognitive psychology, artificial intelligence, image processing, multimodal interaction technology based on the multimedia became dominant. Multimodal interaction is a kind of HCI where a variety of channels are used for interaction. In this case, the computer user interface is called Multimodal User Interface. Based on the different channels of recognition, there are four methods of multimodal interaction. 5.1 Speech Recognition 5.1.1 The development of speech recognition Researches of speech recognition started in the 1950s, and in the 60’s, the application of computer definitely promoted the development of speech recognition. The most important achievement in this period was a proposal of using dynamic programming, a method following the principle of optimality, to adjust the unequal problems of speech recognition. Then, in the 70’s, breakthrough was made in speech recognition since the introduction of Linear Prediction Coding produced a leap in the feature-extraction. What’s more, thanks to the nearly maturity of Dynamic Time Warping, a vector quantization and a Hidden Markov Model theory(HMM) were contrived, implementing a particular isolated speech recognition system based on them. And in the 1980s HMM model was successfully applied in the speech recognition (He,2002). In 21st century, this technology gradually gets a more extensive application among products on the market. 5.1.2 Three technologies of speech recognition The selection of speech recognition units is the first step of research. Word (sentence), syllable and phoneme are the three kinds of speech recognition units. Which one to use depends on the specific situation. For example, words are widely applied in small vocabulary systems rather than large ones considering that the model base is too huge. Syllables were suitable for the single syllable structure language like Chinese, while phoneme units were more common in English recognition before. After the age of 90’s, however, phoneme also got a wide application in Chinese (Chen &Gao, 1996). The extraction of the characteristic parameter is the second stage. To get the serviceable information of speech signals, it is useful to realize the extraction of the characteristic parameter and gather essential messages by removing redundant ones. In this process, different information can be distinguished and at the same time large data can be compressed. The model training refers to acquiring the parameters from a large number of known models according to some certain criteria, and the pattern matching means finding the best matching patterns of an unknown model from the model base. The most common methods used in the model training and the pattern matching include Dynamic time warping technique (DTW), hidden Markov model (HMM) and artificial neural network (ANN). 5.1.3 The application of speech recognition
上海充通大学 Research Status of Human-Computer Interaction Nowadays,thanks to the almost maturity of technology,speech recognition systems are being applied in a large number of fields,such as commerce,education,and entertainment. Instantly speak 31 Languages rate voice recognition VOICE Good moming How are you? buenos dias Comment vas-tu? Thank you.Well Fig.5 iTranslate Voice With the help ofan electronic translator,like iTranslate Voice (as shown in Fig.5),it became possible to communicate with a foreigner even if you don't understand the language of each other.What you need to do is just to speak to the computer,and then the electronic translator,a combination of speech recognition technology,machine translation technology as well as speech synthesis technology,can help to convert what you said into words,translate them and speak out in another language.What's more,we can also practice oral English by talking to a computer,Reading Assistant2 being a vivid example.Applying the system of speech recognition in recreations is undoubtedly very interesting.Nuance Communications,the largest enterprise on this technology, even introduced one kind of software,named Dragon Gaming Speech Pack.The feature of this production is that players can give instructions like“squat'”“aim”“reload”to complete the game,. which is no longer dependent on mouse and keyboard control. All in all,the application has a promising future.It is enough to make big difference of your life though it's still in its start stage.We are looking forward to more exciting innovation of speech recognition technologies. 5.2 Lip-re ading 5.2.1 The brief introduction of lip-reading Lip-reading refers to a progress of understanding other's words by recognizing the shape of changing mouth.And computer lip-reading system combines video signals and speech signals to get message,whichcan improve the recognition rate greatly.From a certain perspective,lip-reading is a branch of speech recognition,but they can be treated as two independent events Actually,computer lip-reading is one of the great signal,processing challenges (Lan,Richard Barry-John,2012).Comparing to speech recognition,lip-reading has a shorter history.The first lip- 2 Reading Assistant is the only online tool that uses speechrecognition tocorrect and support studentsas they read aloud. 8/19
Research Status of Human-Computer Interaction 8 / 19 Nowadays, thanks to the almost maturity of technology, speech recognition systems are being applied in a large number of fields, such as commerce, education, and entertainment. Fig. 5 iTranslate Voice With the help of an electronic translator, like iTranslate Voice (as shown in Fig.5), it became possible to communicate with a foreigner even if you don’t understand the language of each other. What you need to do is just to speak to the computer, and then the electronic translator, a combination of speech recognition technology, machine translation technology as well as speech synthesis technology, can help to convert what you said into words, translate them and speak out in another language. What’s more, we can also practice oral English by talking to a computer, Reading Assistant 2 being a vivid example. Applying the system of speech recognition in recreations is undoubtedly very interesting. Nuance Communications, the largest enterprise on this technology, even introduced one kind of software, named Dragon Gaming Speech Pack. The feature of this production is that players can give instructions like “squat” “aim” “reload” to complete the game, which is no longer dependent on mouse and keyboard control. All in all, the application has a promising future. It is enough to make big difference of your life though it’s still in its start stage. We are looking forward to more exciting innovation of speech recognition technologies. 5.2 Lip-reading 5.2.1 The brief introduction of lip-reading Lip-reading refers to a progress of understanding other’s words by recognizing the shape of changing mouth. And computer lip-reading system combines video signals and speech signals to get message, which can improve the recognition rate greatly. From a certain perspective, lip-reading is a branch of speech recognition, but they can be treated as two independent events. Actually, computer lip-reading is one of the great signal, processing challenges (Lan, Richard & Barry-John, 2012). Comparing to speech recognition, lip-reading has a shorter history. The first lip- 2 Reading Assistant is the only online tool that uses speech recognition to correct and support students as they read aloud
上游文通大学 HANGHAI JIAO TONG UNIVERSIT Research Status of Human-Computer Interaction reading system was not established until 1984.In the following years,more methods including HMM and ANN have been gradually applied to improve this technology. 5.2.2 The methods applied in lip-reading technology Detection and location are the first step,and there are two basic ways.The first is based on the characteristics of face structure,analyzing the information of eyes and nostrils to locate mouth.And the second one is totally different.It is based on the different distribution of skin and lip colors (Zhao Wang,2007). Feature extraction is the next step.Collecting features accurately is an essential basis for the follow- up tracking and recognition work.Scientists have worked out varies of methods,such as Active Appearance Models(AAM),comparing visual features for lip-reading (Sarah,Richard Barry- John,2009)and other new lip models (Shdaifat Grigat,2003). 5.2.3 The application of lip-reading system Lip-reading recognition can be used in two aspects.First,it can complement each other with speech recognition.In 2012,Microsoft was trying to introduce this system to new version of Kinect,so that the users can give their instructions even in a noisy room.Some labs also tried to combine lip- reading and speech recognition,and developed a system that can assist schools for deafmutes on correcting the students'pronunciation in a vivid way (Xie,2005). In addition,researches on lip-reading recognition can also be applied in computer-generated mouth shapes.For example,Poser,a fantastic 3D animation software launched by Metacreations,can synthesize the speaking mouths of the animation roles just by inputting the sound.Similar to speech recognition,lip-reading also has a vast world to develop. 5.3 Facial Expression Recognition 5.3.1 The significance of facial expression research Besides voice,facial expression recognition is also a wide field of human computer interaction.As we all know,human faces contain so essential information about emotions and the mental state of a person.Thus it is utilized in order to enable nonverbal communication with computers(Petar &K. Katsaggelos,2006).Moreover,the recognition of facial expression also relates to psychology, soc iology,anthropology,computer science and some other subjects.As a consequence,the study of expression has a scientific significance of improving the artificial intelligence and promote the development of related disciplines at the same time (Xue,Mao,Guo Lv,2009). 5.3.2 The recognition of facial expression With the active researches of facial expression,various techniques have been developed from all over the world.Above all,establishing a database is a basic step.Currently,the database can be 9/19
Research Status of Human-Computer Interaction 9 / 19 reading system was not established until 1984. In the following years, more methods including HMM and ANN have been gradually applied to improve this technology. 5.2.2 The methods applied in lip-reading technology Detection and location are the first step, and there are two basic ways. The first is based on the characteristics of face structure, analyzing the information of eyes and nostrils to locate mouth. And the second one is totally different. It is based on the different distribution of skin and lip colors (Zhao & Wang, 2007). Feature extraction is the next step. Collecting features accurately is an essential basis for the followup tracking and recognition work. Scientists have worked out varies of methods, such as Active Appearance Models(AAM), comparing visual features for lip-reading (Sarah, Richard & BarryJohn, 2009) and other new lip models (Shdaifat & Grigat, 2003). 5.2.3 The application of lip-reading system Lip-reading recognition can be used in two aspects. First, it can complement each other with speech recognition. In 2012, Microsoft was trying to introduce this system to new version of Kinect, so that the users can give their instructions even in a noisy room. Some labs also tried to combine lipreading and speech recognition, and developed a system that can assist schools for deafmutes on correcting the students’ pronunciation in a vivid way (Xie, 2005). In addition, researches on lip-reading recognition can also be applied in computer-generated mouth shapes. For example, Poser, a fantastic 3D animation software launched by Metacreations, can synthesize the speaking mouths of the animation roles just by inputting the sound. Similar to speech recognition, lip-reading also has a vast world to develop. 5.3 Facial Expression Recognition 5.3.1 The significance of facial expression research Besides voice, facial expression recognition is also a wide field of human computer interaction. As we all know, human faces contain so essential information about emotions and the mental state of a person. Thus it is utilized in order to enable nonverbal communication with computers (Petar & K. Katsaggelos, 2006).Moreover, the recognition of facial expression also relates to psychology, sociology, anthropology, computer science and some other subjects. As a consequence, the study of expression has a scientific significance of improving the artificial intelligence and promote the development of related disciplines at the same time (Xue, Mao, Guo & Lv, 2009). 5.3.2 The recognition of facial expression With the active researches of facial expression, various techniques have been developed from all over the world. Above all, establishing a database is a basic step. Currently, the database can be