正在加载图片...
上浒充通大学 HANGHAI JIAO TONG UNIVERSIT Research Status of Human-Computer Interaction Especially,as many emerging subjects developed,such as cognitive psychology,artificial intelligence,image processing,multimodal interaction technology based on the multimedia became dominant.Multimodal interaction is a kind of HCI where a variety of channels are used for interaction.In this case,the computer user interface is called Multimodal User Interface.Based on the different channels of recognition,there are four methods of multimodal interaction. 5.1 Speech Recognition 5.1.1 The development of speech recognition Researches of speech recognition started in the 1950s,and in the 60's,the application of computer definitely promoted the development of speech recognition.The most important achievement in this period was a proposal of using dynamic programming,a method following the principle of optimality,to adjust the unequal problems of speech recognition.Then,in the 70's,breakthrough was made in speech recognition since the introduction of Linear Prediction Coding produced a leap in the feature-extraction.What's more,thanks to the nearly maturity of Dynamic Time Warping,a vector quantization and a Hidden Markov Model theory(HMM)were contrived,implementing a particular isolated speech recognition system based on them.And in the 1980s HMM model was successfully applied in the speech recognition(He,2002).In 21st century,this technology gradually gets a more extensive application among products on the market. 5.1.2 Three technologies of speech recognition The selection of speech recognition units is the first step of research.Word(sentence),syllable and phoneme are the three kinds of speech recognition units.Which one to use depends on the specific situation.For example,words are widely applied in small vocabulary systems rather than large ones considering that the model base is too huge.Syllables were suitable for the single syllable structure language like Chinese,while phoneme units were more common in English recognition before. After the age of 90's,however,phoneme also got a wide application in Chinese(Chen Gao,1996). The extraction ofthe characteristic parameter is the second stage.To get the serviceable information of speech signals,it is useful to realize the extraction of the characteristic parameter and gather essential messages by removing redundant ones.In this process,different information can be distinguished and at the same time large data can be compressed. The model training refers to acquiring the parameters from a large number of known models according to some certain criteria,and the pattern matching means finding the best matching patterns of an unknown model from the model base.The most common methods used in the model training and the pattern matching include Dynamic time warping technique(DTW),hidden Markov model (HMM)and artificial neural network(ANN). 5.1.3 The application of speech recognition 7119Research Status of Human-Computer Interaction 7 / 19 Especially, as many emerging subjects developed, such as cognitive psychology, artificial intelligence, image processing, multimodal interaction technology based on the multimedia became dominant. Multimodal interaction is a kind of HCI where a variety of channels are used for interaction. In this case, the computer user interface is called Multimodal User Interface. Based on the different channels of recognition, there are four methods of multimodal interaction. 5.1 Speech Recognition 5.1.1 The development of speech recognition Researches of speech recognition started in the 1950s, and in the 60’s, the application of computer definitely promoted the development of speech recognition. The most important achievement in this period was a proposal of using dynamic programming, a method following the principle of optimality, to adjust the unequal problems of speech recognition. Then, in the 70’s, breakthrough was made in speech recognition since the introduction of Linear Prediction Coding produced a leap in the feature-extraction. What’s more, thanks to the nearly maturity of Dynamic Time Warping, a vector quantization and a Hidden Markov Model theory(HMM) were contrived, implementing a particular isolated speech recognition system based on them. And in the 1980s HMM model was successfully applied in the speech recognition (He,2002). In 21st century, this technology gradually gets a more extensive application among products on the market. 5.1.2 Three technologies of speech recognition The selection of speech recognition units is the first step of research. Word (sentence), syllable and phoneme are the three kinds of speech recognition units. Which one to use depends on the specific situation. For example, words are widely applied in small vocabulary systems rather than large ones considering that the model base is too huge. Syllables were suitable for the single syllable structure language like Chinese, while phoneme units were more common in English recognition before. After the age of 90’s, however, phoneme also got a wide application in Chinese (Chen &Gao, 1996). The extraction of the characteristic parameter is the second stage. To get the serviceable information of speech signals, it is useful to realize the extraction of the characteristic parameter and gather essential messages by removing redundant ones. In this process, different information can be distinguished and at the same time large data can be compressed. The model training refers to acquiring the parameters from a large number of known models according to some certain criteria, and the pattern matching means finding the best matching patterns of an unknown model from the model base. The most common methods used in the model training and the pattern matching include Dynamic time warping technique (DTW), hidden Markov model (HMM) and artificial neural network (ANN). 5.1.3 The application of speech recognition
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有