当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

山东大学:语音识别技术(PPT课件讲稿)自动语音识别 Automatic Speech Recognition

资源类别:文库,文档格式:PPTX,文档页数:44,文件大小:2.11MB,团购合买
Introduction Speech recognition based on HMM • Acoustic processing • Acoustic modeling: Hidden Markov Model • Language modeling
点击下载完整版文档(PPTX)

Automatic Speech Recognition ING SHEN SCHOOL OF SOFTWARE ENGINEERING TONGJIUNIVERSITY

Automatic Speech Recognition Y I NG SH EN SCH O O L O F SO FTWARE ENGI NEERING TO NGJI UNI VERSI TY

Outline Introduction Speech recognition based on HMm Acoustic processing Acoustic modeling: Hidden Markov Model anguage modeling HUMAN COMPUTER INTERACTION

Outline Introduction Speech recognition based on HMM • Acoustic processing • Acoustic modeling: Hidden Markov Model • Language modeling 1/28/2021 HUMAN COMPUTER INTERACTION 2

What is speech recognition Automatic speech recognition(asr) is the process by which a computer maps an acoustic speech signal to text Challenges for researchers Linguistic factor Physiologic factor Environmental factor HUMAN COMPUTER INTERACTION

What is speech recognition? Automatic speech recognition(ASR) is the process by which a computer maps an acoustic speech signal to text. Challenges for researchers • Linguistic factor • Physiologic factor • Environmental factor 1/28/2021 HUMAN COMPUTER INTERACTION 3

Classification of speech recognition system Users Speaker dependent system Speaker independent system Speaker adaptive system Vocabulary small vocabulary: tens of word medium vocabulary: hundreds of words large vocabulary: thousands of words very-large vocabulary: tens of thousands of words Word pattern isolated-word system: single words at a time continuous speech system: words are connected together HUMAN COMPUTER INTERACTION

Classification of speech recognition system Users • Speaker dependent system • Speaker independent system • Speaker adaptive system Vocabulary • small vocabulary : tens of word • medium vocabulary : hundreds of words • large vocabulary : thousands of words • very-large vocabulary : tens of thousands of words Word pattern • isolated-word system : single words at a time • continuous speech system : words are connected together 1/28/2021 HUMAN COMPUTER INTERACTION 4

How do human do it? Middle ear 咖中 Eustachian ICULATE CORTE Articulation produces sound waves COCHLEA Which the ear conveys to the brain SIGNAL FROM for processing LEFT EAR COCHI三AR NUCLE SUPERIOR OLIVE HUMAN COMPUTER INTERACTION

How do human do it? Articulation produces sound waves Which the ear conveys to the brain for processing 1/28/2021 HUMAN COMPUTER INTERACTION 5

How might computers do it? Digitization Acoustic analysis of the speech signal Linguistic interpretation Acoustic waveform Acoustic signal 静中解 学需 an maris e va neri n a n :i rout u s even Speech recognition HUMAN COMPUTER INTERACTION

How might computers do it? Digitization Acoustic analysis of the speech signal Linguistic interpretation 1/28/2021 HUMAN COMPUTER INTERACTION 6 Acoustic waveform Acoustic signal Speech recognition

Outline Introduction Speech recognition based on HMm Acoustic processing Acoustic modeling: Hidden Markov Model anguage modeling Statistical approach HUMAN COMPUTER INTERACTION

Outline Introduction Speech recognition based on HMM • Acoustic processing • Acoustic modeling: Hidden Markov Model • Language modeling • Statistical approach 1/28/2021 HUMAN COMPUTER INTERACTION 7

Acoustic processing A wave for the words " speech lab"looks like p ee a 10000 1.20□ “to“a transition 0w个 Graphs from Simon Arnfield' s web tutorial on speech, Sheffield http://lethe.leedsac.uk/research/cogn/speech/tutoriall HUMAN COMPUTER INTERACTION

Acoustic processing A wave for the words “speech lab” looks like: 1/28/2021 HUMAN COMPUTER INTERACTION 8 s p ee ch l a b Graphs from Simon Arnfield’s web tutorial on speech, Sheffield: http://lethe.leeds.ac.uk/research/cogn/speech/tutorial/ “l” to “a” transition:

Acoustic sampling 10 ms frame( ms= millisecond =1/1000 second C25 ms window around frame to smooth signal processing 体体和个 I ms 10ms Result Acoustic Feature vectors -986,-792,-692,-614,-429,-286,-134,-57,-41,-169,-456,-450,-541,-761,-1067,-1231,-1847,-952,-645,-489,-448 -212,193,114,-17,-110,128,261,198,390,461,772,948,1451,1974,2624,3793,4968,5939,6057,6581,7302,7649,7223,6119,5461 4353,3611,2740,204,1349,1178,1085,901,301,-262,-499,-488,-707,-1406,-1997,-2377,-2494,-2605,-2675,-2627,-2500,-2148, 1648,-970,-364,13,260,494,788,1011,938,717,507,323,324,325,350,103,-113,64,176,93,-249,-461,-606,-909,-1159,-1397,-1544 HUMAN COMPUTER INTERACTION 9

Acoustic sampling 10 ms frame (ms = millisecond = 1/1000 second) ~25 ms window around frame to smooth signal processing 1/28/2021 HUMAN COMPUTER INTERACTION 9 25 ms 10ms . . . a1 a2 a3 Result: Acoustic Feature Vectors

Spectral analysis Frequency gives pitch; amplitude gives volume sampling at -8 kHz phone, -16 kHz mic(kHz=1000 cycles/sec) p ee ch 10000 10000 Fourier transform of wave yields a spectrogram darkness indicates energy at each frequency hundreds to thousands of frequency samples HUMAN COMPUTER INTERACTION

Spectral analysis Frequency gives pitch; amplitude gives volume • sampling at ~8 kHz phone, ~16 kHz mic (kHz=1000 cycles/sec) Fourier transform of wave yields a spectrogram • darkness indicates energy at each frequency • hundreds to thousands of frequency samples 1/28/2021 HUMAN COMPUTER INTERACTION 10 s p ee ch l a b

点击下载完整版文档(PPTX)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共44页,可试读15页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有