山东大学：语音识别技术（PPT课件讲稿）自动语音识别 Automatic Speech Recognition

Introduction Speech recognition based on HMM • Acoustic processing • Acoustic modeling: Hidden Markov Model • Language modeling

团购合买资源类别：文库，文档格式：PPTX，文档页数：44，文件大小：2.11MB

Automatic Speech Recognition ING SHEN SCHOOL OF SOFTWARE ENGINEERING TONGJIUNIVERSITY

Automatic Speech Recognition Y I NG SH EN SCH O O L O F SO FTWARE ENGI NEERING TO NGJI UNI VERSI TY

Outline Introduction Speech recognition based on HMm Acoustic processing Acoustic modeling: Hidden Markov Model anguage modeling HUMAN COMPUTER INTERACTION

Outline Introduction Speech recognition based on HMM • Acoustic processing • Acoustic modeling: Hidden Markov Model • Language modeling 1/28/2021 HUMAN COMPUTER INTERACTION 2

What is speech recognition Automatic speech recognition(asr) is the process by which a computer maps an acoustic speech signal to text Challenges for researchers Linguistic factor Physiologic factor Environmental factor HUMAN COMPUTER INTERACTION

What is speech recognition? Automatic speech recognition(ASR) is the process by which a computer maps an acoustic speech signal to text. Challenges for researchers • Linguistic factor • Physiologic factor • Environmental factor 1/28/2021 HUMAN COMPUTER INTERACTION 3

Classification of speech recognition system Users Speaker dependent system Speaker independent system Speaker adaptive system Vocabulary small vocabulary: tens of word medium vocabulary: hundreds of words large vocabulary: thousands of words very-large vocabulary: tens of thousands of words Word pattern isolated-word system: single words at a time continuous speech system: words are connected together HUMAN COMPUTER INTERACTION

Classification of speech recognition system Users • Speaker dependent system • Speaker independent system • Speaker adaptive system Vocabulary • small vocabulary : tens of word • medium vocabulary : hundreds of words • large vocabulary : thousands of words • very-large vocabulary : tens of thousands of words Word pattern • isolated-word system : single words at a time • continuous speech system : words are connected together 1/28/2021 HUMAN COMPUTER INTERACTION 4

How do human do it? Middle ear 咖中 Eustachian ICULATE CORTE Articulation produces sound waves COCHLEA Which the ear conveys to the brain SIGNAL FROM for processing LEFT EAR COCHI三AR NUCLE SUPERIOR OLIVE HUMAN COMPUTER INTERACTION

How do human do it? Articulation produces sound waves Which the ear conveys to the brain for processing 1/28/2021 HUMAN COMPUTER INTERACTION 5

How might computers do it? Digitization Acoustic analysis of the speech signal Linguistic interpretation Acoustic waveform Acoustic signal 静中解学需 an maris e va neri n a n :i rout u s even Speech recognition HUMAN COMPUTER INTERACTION

How might computers do it? Digitization Acoustic analysis of the speech signal Linguistic interpretation 1/28/2021 HUMAN COMPUTER INTERACTION 6 Acoustic waveform Acoustic signal Speech recognition

Outline Introduction Speech recognition based on HMm Acoustic processing Acoustic modeling: Hidden Markov Model anguage modeling Statistical approach HUMAN COMPUTER INTERACTION

Outline Introduction Speech recognition based on HMM • Acoustic processing • Acoustic modeling: Hidden Markov Model • Language modeling • Statistical approach 1/28/2021 HUMAN COMPUTER INTERACTION 7

Acoustic processing A wave for the words " speech lab"looks like p ee a 10000 1.20□ “to“a transition 0w个 Graphs from Simon Arnfield' s web tutorial on speech, Sheffield http://lethe.leedsac.uk/research/cogn/speech/tutoriall HUMAN COMPUTER INTERACTION

Acoustic processing A wave for the words “speech lab” looks like: 1/28/2021 HUMAN COMPUTER INTERACTION 8 s p ee ch l a b Graphs from Simon Arnfield’s web tutorial on speech, Sheffield: http://lethe.leeds.ac.uk/research/cogn/speech/tutorial/ “l” to “a” transition:

Acoustic sampling 10 ms frame( ms= millisecond =1/1000 second C25 ms window around frame to smooth signal processing 体体和个 I ms 10ms Result Acoustic Feature vectors -986,-792,-692,-614,-429,-286,-134,-57,-41,-169,-456,-450,-541,-761,-1067,-1231,-1847,-952,-645,-489,-448 -212,193,114,-17,-110,128,261,198,390,461,772,948,1451,1974,2624,3793,4968,5939,6057,6581,7302,7649,7223,6119,5461 4353,3611,2740,204,1349,1178,1085,901,301,-262,-499,-488,-707,-1406,-1997,-2377,-2494,-2605,-2675,-2627,-2500,-2148, 1648,-970,-364,13,260,494,788,1011,938,717,507,323,324,325,350,103,-113,64,176,93,-249,-461,-606,-909,-1159,-1397,-1544 HUMAN COMPUTER INTERACTION 9

Acoustic sampling 10 ms frame (ms = millisecond = 1/1000 second) ~25 ms window around frame to smooth signal processing 1/28/2021 HUMAN COMPUTER INTERACTION 9 25 ms 10ms . . . a1 a2 a3 Result: Acoustic Feature Vectors

Spectral analysis Frequency gives pitch; amplitude gives volume sampling at -8 kHz phone, -16 kHz mic(kHz=1000 cycles/sec) p ee ch 10000 10000 Fourier transform of wave yields a spectrogram darkness indicates energy at each frequency hundreds to thousands of frequency samples HUMAN COMPUTER INTERACTION

Spectral analysis Frequency gives pitch; amplitude gives volume • sampling at ~8 kHz phone, ~16 kHz mic (kHz=1000 cycles/sec) Fourier transform of wave yields a spectrogram • darkness indicates energy at each frequency • hundreds to thousands of frequency samples 1/28/2021 HUMAN COMPUTER INTERACTION 10 s p ee ch l a b

点击下载完整版文档（PPTX格式）

共44页，可试读15页，点击继续阅读 ↓↓

点击下载（PPTX格式）

浏览记录