© 2000 by CRC Press LLC 15 Speec_中国高校课件下载中心

点击下载：《电子工程师手册》学习资料（英文版）Chapter 15 Speech Signal Processing

正在加载图片...

Stan mcclellan University of Alabama 15 Texas AeM Ur Speech Signal Processing Yariv Ephraim ATeT Bell laboratories George Mason University 15.1 Coding, Transmission, and Storage Standardization. Variable Rate Coding. Summary and Conclusions Lynn D. wilcox 15.2 Speech Enhancement and Noise Reduction FX Palo alto lab Models and Performance Measures. Signal Estimation. Source oding.SignalClassification.comments Marcia a Bush Xerox palo alto research center 15.3 Analysis and Synthesis Linear predictive Yuqing Gao Dynamic Time Warping. Hidden Markov gnde gre srocessfig e Bhuvana ramabhadran Recognition System IBM 15.5 Large Vocabulary Continuous Speech Ro T.J. Watson Research Center ognition System.Hidd y Models as Acoustic Models for Speech Recognition. Speaker Michael Picheny Context in Continuous Speech. Language Modeling. Hypothesis arch. State-of-the-Art Systems. Challenges in Speech T.J. Watson Research Center ecognition.Applications 15.1 Coding, Transmission, and Storage Stan Mcclellan and Jerry D. Gibson Interest in speech coding is motivated by a wide range of applications, including commercial telephony, digital cellular mobile radio, military communications, voice mail, speech storage, and future personal communica tions networks. The goal of speech coding is to represent speech in digital form with as few bits as possible while maintaining the intelligibility and quality required for the particular application. At higher bit rates, such as 64 and 32 kbits/s, achieving good quality and intelligibility is not too difficult, but as the desired bit rate is lowered to 16 kbits/s and below, the problem becomes increasingly challenging. Depending on the application, many difficult constraints must be considered, including the issue of complexity. For example, for the 32-kbits/s speech coding standard, the ITU-T not only required highly intelligible, high-quality speech, but the coder also had to have low delay, withstand independent bit error rates up to 10-2 have acceptable performance degradation for several synchronous or asynchronous tandem connections, and pass some voiceband modem signals. Other applications may have different criteria. Digital cellular mobile radio in the U.S. has no low delay or voiceband modem signal requirements, but the speech data rates required are under 8 kbits/s and the transmission medium(or channel) can be very noisy and have relatively long fades. These considerations affect the speech coder chosen for a particular application As speech coder data rates drop to 16 kbits/s and below, perceptual criteria taking into account human auditory response begin to play a prominent role For time domain coders, the perceptual effects are incorporated using a frequency-weighted error criterion. The frequency-domain coders include perceptual effects by allocating "International Telecommunications Union, Telecommunications Standardization Sector, formerly the CCitt. c 2000 by CRC Press LLC© 2000 by CRC Press LLC 15 Speech Signal Processing 15.1 Coding, Transmission, and Storage General Approaches • Model Adaptation • Analysis-by-Synthesis • Particular Implementations • Speech Quality and Intelligibility • Standardization • Variable Rate Coding • Summary and Conclusions 15.2 Speech Enhancement and Noise Reduction Models and Performance Measures • Signal Estimation • Source Coding • Signal Classification • Comments 15.3 Analysis and Synthesis Analysis of Excitation • Fourier Analysis • Linear Predictive Analysis • Homomorphic (Cepstral) Analysis • Speech Synthesis 15.4 Speech Recognition Speech Recognition System Architecture • Signal Pre-Processing • Dynamic Time Warping • Hidden Markov Models • State-of-the-Art Recognition Systems 15.5 Large Vocabulary Continuous Speech Recognition Overview of a Speech Recognition System • Hidden Markov Models As Acoustic Models for Speech Recognition • Speaker Adaptation • Modeling Context in Continuous Speech • Language Modeling • Hypothesis Search • State-of-the-Art Systems • Challenges in Speech Recognition • Applications 15.1 Coding, Transmission, and Storage Stan McClellan and Jerry D. Gibson Interest in speech coding is motivated by a wide range of applications, including commercial telephony, digital cellular mobile radio, military communications, voice mail, speech storage, and future personal communications networks. The goal of speech coding is to represent speech in digital form with as few bits as possible while maintaining the intelligibility and quality required for the particular application. At higher bit rates, such as 64 and 32 kbits/s, achieving good quality and intelligibility is not too difficult, but as the desired bit rate is lowered to 16 kbits/s and below, the problem becomes increasingly challenging. Depending on the application, many difficult constraints must be considered, including the issue of complexity. For example, for the 32-kbits/s speech coding standard, the ITU-T1 not only required highly intelligible, high-quality speech, but the coder also had to have low delay, withstand independent bit error rates up to 10–2, have acceptable performance degradation for several synchronous or asynchronous tandem connections, and pass some voiceband modem signals. Other applications may have different criteria. Digital cellular mobile radio in the U.S. has no low delay or voiceband modem signal requirements, but the speech data rates required are under 8 kbits/s and the transmission medium (or channel) can be very noisy and have relatively long fades. These considerations affect the speech coder chosen for a particular application. As speech coder data rates drop to 16 kbits/s and below, perceptual criteria taking into account human auditory response begin to play a prominent role. For time domain coders, the perceptual effects are incorporated using a frequency-weighted error criterion. The frequency-domain coders include perceptual effects by allocating 1 International Telecommunications Union, Telecommunications Standardization Sector, formerly the CCITT. Stan McClellan University of Alabama at Birmingham Jerry D. Gibson Texas A&M University Yariv Ephraim AT&T Bell Laboratories George Mason University Jesse W. Fussell Department of Defense Lynn D. Wilcox FX Palo Alto Lab Marcia A. Bush Xerox Palo Alto Research Center Yuqing Gao IBM T.J. Watson Research Center Bhuvana Ramabhadran IBM T.J. Watson Research Center Michael Picheny IBM T.J. Watson Research Center

<<向上翻页向下翻页>>

点击下载：《电子工程师手册》学习资料（英文版）Chapter 15 Speech Signal Processing