正在加载图片...
Encoder FIGURE 15.1 Differential encoder transmitter with a pole-zero predictor. The focus of this article is the contrast among the three most important classes of speech coders that have representative implementations in several international standards--time-domain coders, frequency-domain coders, and hybrid coders In the following, we define these classifications, look specifically at the important haracteristics of representative, general implementations of each class, and briefly discuss the rapidly changing national and international standardization efforts related to speech coding General Approaches Time domain Coders and linear Prediction Linear Predictive Coding(LPC)is a modeling technique that has seen widespread application among time- domain speech coders, largely because it is computationally simple and applicable to the mechanisms involved in speech production. In LPC, general spectral characteristics are described by a parametric model based on estimates of autocorrelations or autocovariances. The model of choice for speech is the all-pole or autoregressive (AR) model. This model is particularly suited for voiced speech because the vocal tract can be well modeled by an all-pole transfer function. In this case, the estimated LPC model parameters correspond to an ar process which can produce waveforms very similar to the original speech segment. Differential Pulse Code Modulation (DPCM)coders(i.e, ITU-T G.721 ADPCM [CCITT, 1984])and LPC vocoders(i.e, U.S. Federal Standard 1015(National Communications System, 1984])are examples of this class of time-domain predictive architec- ture. Code Excited Coders(i.e, ITU-T G728[Chen, 1990] and U.S. Federal Standard 1016[ National Commt lications System, 1991)also utilize LPC spectral modeling techniques. I Based on the general spectral model, a predictive coder formulates an estimate of a future sample of speech based on a weighted combination of the immediately preceding samples. The error in this estimate(the prediction residual) typically comprises a significant portion of the data stream of the encoded speech. The residual contains information that is important in speech perception and cannot be modeled in a straightfor- ward fashion. The most familiar form of predictive coder is the classical Differential Pulse Code Modulation (DPCM)system shown in Fig. 15. 1 In DPCM, the predicted value at time instant k, s(kk-1),is subtracted from the input signal at time k, s(k), to produce the prediction error signal e(k). The prediction error is then approximated ( quantized) and the quantized prediction error, eg(k), is coded (represented as a binary number) for transmission to the receiver. Simultaneously with the coding, ea()is summed with s(kk-1)to yield reconstructed version of the input sample, s(k). Assuming no channel errors, an identical reconstruction, distorted only by the effects of quantization, is accomplished at the receiver. At both the transmitter and receiver, the predicted value at time instant k+l is derived using reconstructed values up through time k, and the procedure is repeated. The first DPCM systems had B(z)=0 and A(z)=>, where (a, i=1.N) are the LPC coefficients and z-I represents unit delay, so that the predicted value was a weighted linear combination of previous reconstructed values, or However, codebook excitation is generally described as a hybrid coding technique c 2000 by CRC Press LLC© 2000 by CRC Press LLC The focus of this article is the contrast among the three most important classes of speech coders that have representative implementations in several international standards—time-domain coders, frequency-domain coders, and hybrid coders. In the following, we define these classifications, look specifically at the important characteristics of representative, general implementations of each class, and briefly discuss the rapidly changing national and international standardization efforts related to speech coding. General Approaches Time Domain Coders and Linear Prediction Linear Predictive Coding (LPC) is a modeling technique that has seen widespread application among time￾domain speech coders, largely because it is computationally simple and applicable to the mechanisms involved in speech production. In LPC, general spectral characteristics are described by a parametric model based on estimates of autocorrelations or autocovariances. The model of choice for speech is the all-pole or autoregressive (AR) model. This model is particularly suited for voiced speech because the vocal tract can be well modeled by an all-pole transfer function. In this case, the estimated LPC model parameters correspond to an AR process which can produce waveforms very similar to the original speech segment. Differential Pulse Code Modulation (DPCM) coders (i.e., ITU-T G.721 ADPCM [CCITT, 1984]) and LPC vocoders (i.e., U.S. Federal Standard 1015 [National Communications System, 1984]) are examples of this class of time-domain predictive architec￾ture. Code Excited Coders (i.e., ITU-T G728 [Chen, 1990] and U.S. Federal Standard 1016 [National Commu￾nications System, 1991]) also utilize LPC spectral modeling techniques.1 Based on the general spectral model, a predictive coder formulates an estimate of a future sample of speech based on a weighted combination of the immediately preceding samples. The error in this estimate (the prediction residual) typically comprises a significant portion of the data stream of the encoded speech. The residual contains information that is important in speech perception and cannot be modeled in a straightfor￾ward fashion. The most familiar form of predictive coder is the classical Differential Pulse Code Modulation (DPCM) system shown in Fig. 15.1. In DPCM, the predicted value at time instant k, ˆs(k *k – 1), is subtracted from the input signal at time k, s(k), to produce the prediction error signal e(k). The prediction error is then approximated (quantized) and the quantized prediction error, eq(k), is coded (represented as a binary number) for transmission to the receiver. Simultaneously with the coding, eq(k) is summed with ˆs(k *k – 1) to yield a reconstructed version of the input sample, ˆs(k). Assuming no channel errors, an identical reconstruction, distorted only by the effects of quantization, is accomplished at the receiver.At both the transmitter and receiver, the predicted value at time instant k +1 is derived using reconstructed values up through time k, and the procedure is repeated. The first DPCM systems had ˆ B(z) = 0 and Â(z) = , where {ai ,i = 1…N} are the LPC coefficients and z –1 represents unit delay, so that the predicted value was a weighted linear combination of previous reconstructed values, or 1 However, codebook excitation is generally described as a hybrid coding technique. FIGURE 15.1 Differential encoder transmitter with a pole-zero predictor. aiz i i N - Â=1
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有