564 Chapter 13.Fourier and Spectral Applications for specific situations,and arm themselves with a variety of other tricks.We suggest that you do likewise,as your projects demand. CITED REFERENCES AND FURTHER READING: Hamming,R.W.1983,Digita/Filters,2nd ed.(Englewood Cliffs,NJ:Prentice-Hall). Antoniou,A.1979,Digital Filters:Analysis and Design (New York:McGraw-Hill). Parks,T.W.,and Burrus,C.S.1987,Digital Filter Design(New York:Wiley). Oppenheim,A.V.,and Schafer,R.W.1989,Discrete-Time Signa/Processing (Englewood Cliffs, NJ:Prentice-Hall). Rice,J.R.1964.The Approximation of Functions (Reading,MA:Addison-Wesley):also 1969. op.cit.,Vol.2. Rabiner,L.R.,and Gold,B.1975,Theory and Application of DigitalSignal Processing(Englewood Cliffs,NJ:Prentice-Hall). 13.6 Linear Prediction and Linear Predictive Coding (Nort server We begin with a very general formulation that will allow us to make connections America Press. to various special cases.Let fy be a set of measured values for some underlying ART set of true values of a quantity y,denoted [y,related to these true values by the addition of random noise, Programs Ya Ya na (13.6.1) (compare equation 13.3.2,with a somewhat different notation).Our use of a Greek 6 家 subscript to index the members of the set is meant to indicate that the data points are not necessarily equally spaced along a line,or even ordered:they might be "random"points in three-dimensional space,for example.Now,suppose we want to construct the "best estimate of the true value of some particular point y,as a linear combination of the known,noisy,values.Writing =∑d.a%+x (13.6.2) Numerical Recipes 10.621 43106 we want to find coefficients d that minimize,in some way,the discrepancy x.. The coefficients d have a"star"subscript to indicate that they depend on the choice (outside of point y..Later,we might want to let y,be one of the existing y's.In that case our problem becomes one of optimal filtering or estimation,closely related to the North Software. discussion in $13.3.On the other hand,we might want y.to be a completely new point.In that case,our problem will be one of linear prediction. A natural way to minimize the discrepancy r.is in the statistical mean square sense.If angle brackets denote statistical averages,then we seek d's that minimize (13.6.3) =(aa)+(nana)d.ad,g-2(:ya)da+(g〉
564 Chapter 13. Fourier and Spectral Applications Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). for specific situations, and arm themselves with a variety of other tricks. We suggest that you do likewise, as your projects demand. CITED REFERENCES AND FURTHER READING: Hamming, R.W. 1983, Digital Filters, 2nd ed. (Englewood Cliffs, NJ: Prentice-Hall). Antoniou, A. 1979, Digital Filters: Analysis and Design (New York: McGraw-Hill). Parks, T.W., and Burrus, C.S. 1987, Digital Filter Design (New York: Wiley). Oppenheim, A.V., and Schafer, R.W. 1989, Discrete-Time Signal Processing (Englewood Cliffs, NJ: Prentice-Hall). Rice, J.R. 1964, The Approximation of Functions (Reading, MA: Addison-Wesley); also 1969, op. cit., Vol. 2. Rabiner, L.R., and Gold, B. 1975, Theory and Application of Digital Signal Processing (Englewood Cliffs, NJ: Prentice-Hall). 13.6 Linear Prediction and Linear Predictive Coding We begin with a very general formulation that will allow us to make connections to various special cases. Let {y α} be a set of measured values for some underlying set of true values of a quantity y, denoted {yα}, related to these true values by the addition of random noise, y α = yα + nα (13.6.1) (compare equation 13.3.2, with a somewhat different notation). Our use of a Greek subscript to index the members of the set is meant to indicate that the data points are not necessarily equally spaced along a line, or even ordered: they might be “random” points in three-dimensional space, for example. Now, suppose we want to construct the “best” estimate of the true value of some particular point y as a linear combination of the known, noisy, values. Writing y = α dαy α + x (13.6.2) we want to find coefficients dα that minimize, in some way, the discrepancy x. The coefficients dα have a “star” subscript to indicate that they depend on the choice of point y. Later, we might want to let y be one of the existing yα’s. In that case, our problem becomes one of optimal filtering or estimation, closely related to the discussion in §13.3. On the other hand, we might want y to be a completely new point. In that case, our problem will be one of linear prediction. A natural way to minimize the discrepancy x is in the statistical mean square sense. If angle brackets denote statistical averages, then we seek dα’s that minimize x2 = α dα(yα + nα) − y 2 = αβ (yαyβ + nαnβ)dαdβ − 2 α yyα dα + y2 (13.6.3)
13.6 Linear Prediction and Linear Predictive Coding 565 Here we have used the fact that noise is uncorrelated with signal,e.g.,(ny)=0. The quantities (yys)and (yy)describe the autocorrelation structure of the underlying data.We have already seen an analogous expression,(13.2.2),for the case of equally spaced data points on a line;we will meet correlation several times again in its statistical sense in Chapters 14 and 15.The quantities(nna)describe the autocorrelation properties of the noise.Often,for point-to-point uncorrelated noise, we have (nan)=(n)68.It is convenient to think of the various correlation quantities as comprising matrices and vectors, paB三〈yayB) pa三(y.ya)7ag三(nang〉or〈na)6ag(13.6.4) Setting the derivative of equation (13.6.3)with respect to the d's equal to zero, one readily obtains the set of linear equations. 恩李P a+aldg=oe (13.6.5) 3 If we write the solution as a matrix inverse,then the estimation equation(13.6.2) RECIPES becomes,omitting the minimized discrepancy x., 9 ≈∑a[w+ua 13.6.6) 9 From equations(13.6.3)and(13.6.5)one can also calculate the expected mean square 9 value of the discrepancy at its minimum,denoted ● (z)o=(g)-∑d,gpa=(g〉-∑a[-w+Tuvlag 3 (13.6.7) 6 A final general result tells how much the mean square discrepancy (2)is increased if we use the estimation equation(13.6.2)not with the best values dB,but with some other values d.B.The above equations then imply (2)=(2)o+>(d.a-da)[oo8+naB](d.a-d.a) (13.6.8) Numerica 10621 aB Since the second term is a pure quadratic form,we see that the increase in the discrepancy is only second order in any error made in estimating the d.a's. (outside Connection to Optimal Filtering North If we change "star"to a Greek index,say y,then the above formulas describe optimal filtering,generalizing the discussion of$13.3.One sees,for example,that if the noise amplitudes na go to zero,so likewise do the noise autocorrelations n8,and,canceling a matrix times its inverse,equation(13.6.6)simply becomes yy Another special case occurs if the matrices and ns are diagonal. In that case,equation (13.6.6)becomes = 13.6.9) y+y
13.6 Linear Prediction and Linear Predictive Coding 565 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Here we have used the fact that noise is uncorrelated with signal, e.g., n αyβ = 0. The quantities yαyβ and yyα describe the autocorrelation structure of the underlying data. We have already seen an analogous expression, (13.2.2), for the case of equally spaced data points on a line; we will meet correlation several times again in its statistical sense in Chapters 14 and 15. The quantitiesnαnβ describe the autocorrelation properties of the noise. Often, for point-to-point uncorrelated noise, we have nαnβ = n2 α δαβ. It is convenient to think of the various correlation quantities as comprising matrices and vectors, φαβ ≡ yαyβ φα ≡ yyα ηαβ ≡ nαnβ or n2 α δαβ (13.6.4) Setting the derivative of equation (13.6.3) with respect to the d α’s equal to zero, one readily obtains the set of linear equations, β [φαβ + ηαβ] dβ = φα (13.6.5) If we write the solution as a matrix inverse, then the estimation equation (13.6.2) becomes, omitting the minimized discrepancy x, y ≈ αβ φα [φµν + ηµν ] −1 αβ y β (13.6.6) From equations (13.6.3) and (13.6.5) one can also calculate the expected mean square value of the discrepancy at its minimum, denoted x2 0, x2 0 = y2 − β dβφβ = y2 − αβ φα [φµν + ηµν ] −1 αβ φβ (13.6.7) A final general result tells how much the mean square discrepancy x2 is increased if we use the estimation equation (13.6.2) not with the best values d β, but with some other values d β. The above equations then imply x2 = x2 0 + αβ (d α − dα) [φαβ + ηαβ] (d β − dβ) (13.6.8) Since the second term is a pure quadratic form, we see that the increase in the discrepancy is only second order in any error made in estimating the d β’s. Connection to Optimal Filtering If we change “star” to a Greek index, say γ, then the above formulas describe optimal filtering, generalizing the discussion of §13.3. One sees, for example, that if the noise amplitudes nα go to zero, so likewise do the noise autocorrelations ηαβ, and, canceling a matrix times its inverse, equation (13.6.6) simply becomes yγ = y γ. Another special case occurs if the matrices φαβ and ηαβ are diagonal. In that case, equation (13.6.6) becomes yγ = φγγ φγγ + ηγγ y γ (13.6.9)
566 Chapter 13.Fourier and Spectral Applications which is readily recognizable as equation(13.3.6)with SN2What is going on is this:For the case of equally spaced data points,and in the Fourier domain,autocorrelations become simply squares of Fourier amplitudes (Wiener- Khinchin theorem,equation 12.0.12),and the optimal filter can be constructed algebraically,as equation(13.6.9),without inverting any matrix. More generally,in the time domain,or any other domain,an optimal filter(one that minimizes the square of the discrepancy from the underlying true value in the presence of measurement noise)can be constructed by estimating the autocorrelation matrices 8 and nas,and applying equation (13.6.6)with *y.(Equation 13.6.8 is in fact the basis for the $13.3's statement that even crude optimal filtering 81 can be quite effective. Linear Prediction ICAL Classical linear prediction specializes to the case where the data points ys are equally spaced along a line,y,i=1,2,...,N,and we want to use M consecutive values of yi to predict an M+1st.Stationarity is assumed.That is,the RECIPES autocorrelation (yjy)is assumed to depend only on the difference j-k|,and not on j or k individually,so that the autocorrelation has only a single index, 9 N-i 1 p三(〈y+》≈ N-j 左y+方 (13.6.10) 9 i=1 星是 9 Here,the approximate equality shows one way to use the actual data set values to estimate the autocorrelation components.(In fact,there is a better way to make these estimates;see below.)In the situation described,the estimation equation(13.6.2)is M Un= djyn-j+In (13.6.11) j=1 (compare equation 13.5.1)and equation(13.6.5)becomes the set of M equations for the M unknown di's,now called the linear prediction (LP)coefficients, Numerica 10621 43126 ∑-= (k=1,,M) (13.6.12) j=1 Notice that while noise is not explicitly included in the equations,it is properly North accounted for,if it is point-to-point uncorrelated:o,as estimated by equation (13.6.10)using measuredvaluesy,actually estimates the diagonal part of+n, above.The mean square discrepancy (is estimated by equation(13.6.7)as (x品〉=0-p1d-p2d2-…-pMdM (13.6.13 To use linear prediction,we first compute the di's,using equations(13.6.10) and (13.6.12).We then calculate equation (13.6.13)or,more concretely,apply (13.6.11)to the known record to get an idea of how large are the discrepancies i. If the discrepancies are small,then we can continue applying(13.6.11)right on into
566 Chapter 13. Fourier and Spectral Applications Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). which is readily recognizable as equation (13.3.6) with S 2 → φγγ, N2 → ηγγ. What is going on is this: For the case of equally spaced data points, and in the Fourier domain, autocorrelations become simply squares of Fourier amplitudes (WienerKhinchin theorem, equation 12.0.12), and the optimal filter can be constructed algebraically, as equation (13.6.9), without inverting any matrix. More generally, in the time domain, or any other domain, an optimal filter (one that minimizes the square of the discrepancy from the underlying true value in the presence of measurement noise) can be constructed by estimating the autocorrelation matrices φαβ and ηαβ, and applying equation (13.6.6) with → γ. (Equation 13.6.8 is in fact the basis for the §13.3’s statement that even crude optimal filtering can be quite effective.) Linear Prediction Classical linear prediction specializes to the case where the data points yβ are equally spaced along a line, yi, i = 1, 2,...,N, and we want to use M consecutive values of yi to predict an M + 1st. Stationarity is assumed. That is, the autocorrelation yjyk is assumed to depend only on the difference |j − k|, and not on j or k individually, so that the autocorrelation φ has only a single index, φj ≡ yiyi+j ≈ 1 N − j N −j i=1 yiyi+j (13.6.10) Here, the approximate equality shows one way to use the actual data set values to estimate the autocorrelation components. (In fact, there is a better way to make these estimates; see below.) In the situation described, the estimation equation (13.6.2) is yn = M j=1 djyn−j + xn (13.6.11) (compare equation 13.5.1) and equation (13.6.5) becomes the set of M equations for the M unknown dj ’s, now called the linear prediction (LP) coefficients, M j=1 φ|j−k| dj = φk (k = 1,...,M) (13.6.12) Notice that while noise is not explicitly included in the equations, it is properly accounted for, if it is point-to-point uncorrelated: φ0, as estimated by equation (13.6.10) using measured values y i, actually estimates the diagonal part of φαα+ηαα, above. The mean square discrepancy x2 n is estimated by equation (13.6.7) as x2 n = φ0 − φ1d1 − φ2d2 −···− φM dM (13.6.13) To use linear prediction, we first compute the dj ’s, using equations (13.6.10) and (13.6.12). We then calculate equation (13.6.13) or, more concretely, apply (13.6.11) to the known record to get an idea of how large are the discrepancies x i. If the discrepancies are small, then we can continue applying (13.6.11) right on into
13.6 Linear Prediction and Linear Predictive Coding 567 the future,imagining the unknown"future"discrepancies x;to be zero.In this application,(13.6.11)is a kind of extrapolation formula.In many situations,this extrapolation turns out to be vastly more powerful than any kind of simple polynomial extrapolation.(By the way,you should not confuse the terms"linear prediction"and "linear extrapolation";the general functional form used by linear prediction is much more complex than a straight line,or even a low-order polynomial!) However,to achieve its full usefulness,linear prediction must be constrained in one additional respect:One must take additional measures to guarantee its stability. Equation(13.6.11)is a special case ofthe general linear filter(13.5.1).The condition that(13.6.11)be stable as a linear predictor is precisely that given in equations (13.5.5)and (13.5.6),namely that the characteristic polynomial 思2君 81 N >dzN-3=0 (13.6.14) ICAL j=1 have all N of its roots inside the unit circle. |z≤1 (13.6.15) 9 There is no guarantee that the coefficients produced by equation(13.6.12)will have 2人 this property.If the data contain many oscillations without any particular trend towards increasing or decreasing amplitude,then the complex roots of(13.6.14) will generally all be rather close to the unit circle.The finite length of the data gO 9 set will cause some of these roots to be inside the unit circle.others outside.In some applications,where the resulting instabilities are slowly growing and the linear prediction is not pushed too far,it is best to use the "unmassaged"LP coefficients that come directly out of(13.6.12).For example,one might be extrapolating to fill a 61 short gap in a data set;then one might extrapolate both forwards across the gap and backwards from the data beyond the gap.If the two extrapolations agree tolerably well,then instability is not a problem. When instability is a problem,you have to"massage"the LP coefficients.You do this by (i)solving(numerically)equation(13.6.14)for its N complex roots;(ii) moving the roots to where you think they ought to be inside or on the unit circle;(iii) ©e Numerica 10-521 reconstituting the now-modified LP coefficients.You may think that step (ii)sounds 431 a little vague.It is.There is no"best"procedure.If you think that your signal Recipes is truly a sum of undamped sine and cosine waves(perhaps with incommensurate periods),then you will want simply to move each root z;onto the unit circle. North 2→4/lz (13.6.16) In other circumstances it may seem appropriate to reflect a bad root across the unit circle 2一1/2,* (13.6.17) This alternative has the property that it preserves the amplitude of the output of (13.6.11)when it is driven by a sinusoidal set of zi's.It assumes that (13.6.12) has correctly identified the spectral width of a resonance,but only slipped up on
13.6 Linear Prediction and Linear Predictive Coding 567 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). the future, imagining the unknown “future” discrepancies xi to be zero. In this application, (13.6.11) is a kind of extrapolation formula. In many situations, this extrapolation turns out to be vastly more powerful than any kind of simple polynomial extrapolation. (By the way, you should not confuse the terms “linear prediction” and “linear extrapolation”; the general functional form used by linear prediction is much more complex than a straight line, or even a low-order polynomial!) However, to achieve its full usefulness, linear prediction must be constrained in one additional respect: One must take additional measures to guarantee its stability. Equation (13.6.11) is a special case of the general linear filter (13.5.1). The condition that (13.6.11) be stable as a linear predictor is precisely that given in equations (13.5.5) and (13.5.6), namely that the characteristic polynomial zN − N j=1 dj zN−j =0 (13.6.14) have all N of its roots inside the unit circle, |z| ≤ 1 (13.6.15) There is no guarantee that the coefficients produced by equation (13.6.12) will have this property. If the data contain many oscillations without any particular trend towards increasing or decreasing amplitude, then the complex roots of (13.6.14) will generally all be rather close to the unit circle. The finite length of the data set will cause some of these roots to be inside the unit circle, others outside. In some applications, where the resulting instabilities are slowly growing and the linear prediction is not pushed too far, it is best to use the “unmassaged” LP coefficients that come directly out of (13.6.12). For example, one might be extrapolating to fill a short gap in a data set; then one might extrapolate both forwards across the gap and backwards from the data beyond the gap. If the two extrapolations agree tolerably well, then instability is not a problem. When instability is a problem, you have to “massage” the LP coefficients. You do this by (i) solving (numerically) equation (13.6.14) for its N complex roots; (ii) moving the roots to where you think they ought to be inside or on the unit circle; (iii) reconstituting the now-modified LP coefficients. You may think that step (ii) sounds a little vague. It is. There is no “best” procedure. If you think that your signal is truly a sum of undamped sine and cosine waves (perhaps with incommensurate periods), then you will want simply to move each root z i onto the unit circle, zi → zi/ |zi| (13.6.16) In other circumstances it may seem appropriate to reflect a bad root across the unit circle zi → 1/zi* (13.6.17) This alternative has the property that it preserves the amplitude of the output of (13.6.11) when it is driven by a sinusoidal set of xi’s. It assumes that (13.6.12) has correctly identified the spectral width of a resonance, but only slipped up on
568 Chapter 13. Fourier and Spectral Applications identifying its time sense so that signals that should be damped as time proceeds end up growing in amplitude.The choice between(13.6.16)and(13.6.17)sometimes might as well be based on voodoo.We prefer (13.6.17). Also magical is the choice of M,the number of LP coefficients to use.You should choose M to be as small as works for you,that is,you should choose it by experimenting with your data.Try M=5.10,20,40.If you need larger M's than this,be aware that the procedure of"massaging"all those complex roots is quite sensitive to roundoff error.Use double precision. Linear prediction is especially successful at extrapolating signals that are smooth and oscillatory,though not necessarily periodic.In such cases,linear prediction often extrapolates accurately through many cycles of the signal.By contrast,polynomial extrapolation in general becomes seriously inaccurate after at most a cycle or two. 鱼 19881992 A prototypical example of a signal that can successfully be linearly predicted is the height of ocean tides,for which the fundamental 12-hour period is modulated in -200 phase and amplitude over the course of the month and year,and for which local hydrodynamic effects may make even one cycle of the curve look rather different from NUMERICAL RECIPESI in shape from a sine wave. We already remarked that equation (13.6.10)is not necessarily the best way (Nort server to estimate the covariances o from the data set.In fact,results obtained from linear prediction are remarkably sensitive to exactly how the o's are estimated. America computer, make one paper UnN电.t THE One particularly good method is due to Burg [1],and involves a recursive procedure ART for increasing the order M by one unit at a time,at each stage re-estimating the coefficients d,j=1,...,M so as to minimize the residual in equation(13.6.13). Programs Although further discussion of the Burg method is beyond our scope here,the method is implemented in the following routine [1.2]for estimating the LP coefficients dj of a data set 会 #include #include "nrutil.h' void memcof(float data[],int n,int m,float *xms,float d[) Given a real vector of data[1..n],and given m,this routine returns m linear prediction coef- OF SCIENTIFIC COMPUTING(ISBN 0-521- ficients as d[1..m],and returns the mean square discrepancy as xms. int k,j,i; float p=0.0,*wk1,*wk2,*wkm; v@cambridge.org er eba 1988-1992 by Numerical Recipes -431085 yki-vector(1,n); wk2-vector(1,n); wkm=vector(1,m); for (j=1;j<=n;j++)p +SQR(data[j]); Software. 来xmg=o/n: wk1[1]=data[1]; wk2[n-1]=data[n]; (outside North America) for(j=2;j<=n-1;j++)( wk1[i]=data[i]; wk2[j-1]=data[j]; for (k=1;k<=m;k++){ float num=0.0,denom=0.0; for(j=1;j<=(n-k);j+)1 num +wk1[j]*wk2[i] denom +SQR(wk1[j])+SQR(wk2[]); d[k]=2.0*num/denom;
568 Chapter 13. Fourier and Spectral Applications Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). identifying its time sense so that signals that should be damped as time proceeds end up growing in amplitude. The choice between (13.6.16) and (13.6.17) sometimes might as well be based on voodoo. We prefer (13.6.17). Also magical is the choice of M, the number of LP coefficients to use. You should choose M to be as small as works for you, that is, you should choose it by experimenting with your data. Try M = 5, 10, 20, 40. If you need larger M’s than this, be aware that the procedure of “massaging” all those complex roots is quite sensitive to roundoff error. Use double precision. Linear prediction is especially successful at extrapolating signals that are smooth and oscillatory, though not necessarily periodic. In such cases, linear prediction often extrapolates accurately through many cycles of the signal. By contrast, polynomial extrapolation in general becomes seriously inaccurate after at most a cycle or two. A prototypical example of a signal that can successfully be linearly predicted is the height of ocean tides, for which the fundamental 12-hour period is modulated in phase and amplitude over the course of the month and year, and for which local hydrodynamic effects may make even one cycle of the curve look rather different in shape from a sine wave. We already remarked that equation (13.6.10) is not necessarily the best way to estimate the covariances φk from the data set. In fact, results obtained from linear prediction are remarkably sensitive to exactly how the φ k’s are estimated. One particularly good method is due to Burg [1], and involves a recursive procedure for increasing the order M by one unit at a time, at each stage re-estimating the coefficients dj , j = 1,...,M so as to minimize the residual in equation (13.6.13). Although further discussion of the Burg method is beyond our scope here, the method is implemented in the following routine [1,2] for estimating the LP coefficients dj of a data set. #include #include "nrutil.h" void memcof(float data[], int n, int m, float *xms, float d[]) Given a real vector of data[1..n], and given m, this routine returns m linear prediction coef- ficients as d[1..m], and returns the mean square discrepancy as xms. { int k,j,i; float p=0.0,*wk1,*wk2,*wkm; wk1=vector(1,n); wk2=vector(1,n); wkm=vector(1,m); for (j=1;j<=n;j++) p += SQR(data[j]); *xms=p/n; wk1[1]=data[1]; wk2[n-1]=data[n]; for (j=2;j<=n-1;j++) { wk1[j]=data[j]; wk2[j-1]=data[j]; } for (k=1;k<=m;k++) { float num=0.0,denom=0.0; for (j=1;j<=(n-k);j++) { num += wk1[j]*wk2[j]; denom += SQR(wk1[j])+SQR(wk2[j]); } d[k]=2.0*num/denom;
13.6 Linear Prediction and Linear Predictive Coding 569 *xms*=(1.0-S0R(d[k]); for(1=1:1 #include "complex.h" #define NMAX 100 Largest expected value of m. OF SCIENTIFIC #define ZERO Complex(0.0,0.0) #define ONE Complex(1.0,0.0) to dir void fixrts(float d],int m) Given the LP coefficients d[1..m],this routine finds all roots of the characteristic polynomial (13.6.14),reflects any roots that are outside the unit circle back inside,and then returns a modified set of coefficients d[1..m] 1920 COMPUTING(ISBN void zroots(fcomplex a],int m,fcomplex roots[],int polish); int i,j,polish; Numerical 1021 fcomplex a[NMAX],roots [NMAX]; a[m]=ONE; 43108 for(j=m-1;j>=0;j--) Set up complex coefficients for polynomial root Recipes a[j]=Complex(-d[m-j],0.0); finder. polish=1; (outside zroots(a,m,roots,polish); Find all the roots Software. for (j=1;j1.0) root outside the unit circle, roots [j]=Cdiv(ONE,Conjg(roots [j]));and reflect it back inside a [O]=Csub(ZERO,roots[1]); Now reconstruct the polynomial coefficients, a[1]=0WE; for (j=2;j=2:i=-) and synthetically multiplying. a[i-1]=Csub(a[i-2],Cmul(roots [j],a[i-1])); a[O]=Csub(ZERO,Cmul(roots[j],a [O])); for(j=0;j<=m-1;j+) The polynomial coefficients are guaranteed to be d[m-j]=-a[j].r; real,so we need only return the real part as new LP coefficients
13.6 Linear Prediction and Linear Predictive Coding 569 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). *xms *= (1.0-SQR(d[k])); for (i=1;i #include "complex.h" #define NMAX 100 Largest expected value of m. #define ZERO Complex(0.0,0.0) #define ONE Complex(1.0,0.0) void fixrts(float d[], int m) Given the LP coefficients d[1..m], this routine finds all roots of the characteristic polynomial (13.6.14), reflects any roots that are outside the unit circle back inside, and then returns a modified set of coefficients d[1..m]. { void zroots(fcomplex a[], int m, fcomplex roots[], int polish); int i,j,polish; fcomplex a[NMAX],roots[NMAX]; a[m]=ONE; for (j=m-1;j>=0;j--) Set up complex coefficients for polynomial root a[j]=Complex(-d[m-j],0.0); finder. polish=1; zroots(a,m,roots,polish); Find all the roots. for (j=1;j 1.0) root outside the unit circle, roots[j]=Cdiv(ONE,Conjg(roots[j])); and reflect it back inside. a[0]=Csub(ZERO,roots[1]); Now reconstruct the polynomial coefficients, a[1]=ONE; for (j=2;j=2;i--) and synthetically multiplying. a[i-1]=Csub(a[i-2],Cmul(roots[j],a[i-1])); a[0]=Csub(ZERO,Cmul(roots[j],a[0])); } for (j=0;j<=m-1;j++) The polynomial coefficients are guaranteed to be real, so we need only return the real part as new LP coefficients. d[m-j] = -a[j].r; }
570 Chapter 13.Fourier and Spectral Applications #include "nrutil.h" void predic(float data[],int ndata,float d[],int m,float future[], int nfut) Given data[1..ndata],and given the data's LP coefficients d[1..m],this routine ap- plies equation (13.6.11)to predict the next nfut data points,which it returns in the array future [1..nfut].Note that the routine references only the last m values of data,as initial values for the prediction. int k,j; float sum,discrp,*regi reg=vector(1,m); for (j=1;j=2;k--)reg[k]=reg[k-1]; [If you want to implement circular future[j]=reg[1]=sum; arrays,you can avoid this shift- ing of coefficients. free_vector(reg,1,m); e西d相 Press. Removing the Bias in Linear Prediction You might expect that the sum of the di's in equation (13.6.11)(or,more generally,in equation 13.6.2)should be 1,so that (e.g.)adding a constant to all the 2州 IENTIFIC data points vi yields a prediction that is increased by the same constant.However, the di's do not sum to I but,in general,to a value slightly less than one.This fact reveals a subtle point,that the estimator of classical linear prediction is not unbiased, even though it does minimize the mean square discrepancy.At any place where the measured autocorrelation does not imply a better estimate,the equations of linear prediction tend to predict a value that tends towards zero. 10621 Sometimes,that is just what you want.If the process that generates the yi's Numerica in fact has zero mean,then zero is the best guess absent other information.At other times,however,this behavior is unwarranted.If you have data that show 4310 only small variations around a positive value,you don't want linear predictions Recipes that droop towards zero. Often it is a workable approximation to subtract the mean off your data set, perform the linear prediction,and then add the mean back.This procedure contains the germ of the correct solution;but the simple arithmetic mean is not quite the correct constant to subtract.In fact,an unbiased estimator is obtained by subtracting from every data point an autocorrelation-weighted mean defined by [3.4] (13.6.18) aB With this subtraction,the sum of the LP coefficients should be unity,up to roundoff and differences in how the o's are estimated
570 Chapter 13. Fourier and Spectral Applications Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). #include "nrutil.h" void predic(float data[], int ndata, float d[], int m, float future[], int nfut) Given data[1..ndata], and given the data’s LP coefficients d[1..m], this routine applies equation (13.6.11) to predict the next nfut data points, which it returns in the array future[1..nfut]. Note that the routine references only the last m values of data, as initial values for the prediction. { int k,j; float sum,discrp,*reg; reg=vector(1,m); for (j=1;j=2;k--) reg[k]=reg[k-1]; [If you want to implement circular arrays, you can avoid this shifting of coefficients.] future[j]=reg[1]=sum; } free_vector(reg,1,m); } Removing the Bias in Linear Prediction You might expect that the sum of the dj ’s in equation (13.6.11) (or, more generally, in equation 13.6.2) should be 1, so that (e.g.) adding a constant to all the data points yi yields a prediction that is increased by the same constant. However, the dj ’s do not sum to 1 but, in general, to a value slightly less than one. This fact reveals a subtle point, that the estimator of classical linear prediction is not unbiased, even though it does minimize the mean square discrepancy. At any place where the measured autocorrelation does not imply a better estimate, the equations of linear prediction tend to predict a value that tends towards zero. Sometimes, that is just what you want. If the process that generates the yi’s in fact has zero mean, then zero is the best guess absent other information. At other times, however, this behavior is unwarranted. If you have data that show only small variations around a positive value, you don’t want linear predictions that droop towards zero. Often it is a workable approximation to subtract the mean off your data set, perform the linear prediction, and then add the mean back. This procedure contains the germ of the correct solution; but the simple arithmetic mean is not quite the correct constant to subtract. In fact, an unbiased estimator is obtained by subtracting from every data point an autocorrelation-weighted mean defined by [3,4] y ≡ β [φµν + ηµν ] −1 αβ yβ αβ [φµν + ηµν] −1 αβ (13.6.18) With this subtraction, the sum of the LP coefficients should be unity, up to roundoff and differences in how the φk’s are estimated
13.6 Linear Prediction and Linear Predictive Coding 571 Linear Predictive Coding (LPC) A different.though related.method to which the formalism above can be applied is the "compression"of a sampled signal so that it can be stored more compactly.The original form should be exactly recoverable from the compressed version.Obviously,compression can be accomplished only if there is redundancy in the signal.Equation (13.6.11)describes one kind of redundancy:It says that the signal,except for a small discrepancy,is predictable from its previous values and from a small number of LP coefficients.Compression of a signal by the use of (13.6.11)is thus called linear predictive coding,or LPC. The basic idea of LPC(in its simplest form)is to record as a compressed file(i) the number of LP coefficients M,(ii)their M values,e.g.,as obtained by memcof, (iii)the first M data points,and then(iv)for each subsequent data point only its residual discrepancy zi(equation 13.6.1).When you are creating the compressed file,you find the residual by applying(13.6.1)to the previous M points,subtracting the sum from the actual value of the current point.When you are reconstructing the 0 original file,you add the residual back in,at the point indicated in the routine predic. It may not be obvious why there is any compression at all in this scheme.After 9 all,we are storing one value of residual per data point!Why not just store the original data point?The answer depends on the relative sizes of the numbers involved.The residual is obtained by subtracting two very nearly equal numbers (the data and the linear prediction).Therefore,the discrepancy typically has only a very small number of nonzero bits.These can be stored in a compressed file.How do you do it in a 9 high-level language?Here is one way:Scale your data to have integer values,say %P between +1000000 and-1000000(supposing that you need six significant figures). OF SCIENTIFIC Modify equation(13.6.1)by enclosing the sum term in an"integer part of operator. The discrepancy will now,by definition,be an integer.Experiment with different 6 values of M,to find LP coefficients that make the range of the discrepancy as small as you can.Ifyou can get to within a range of+127(and in our experience this is not at all difficult)then you can write it to a file as a single byte.This is a compression factor of 4,compared to 4-byte integer or floating formats. Notice that the LP coefficients are computed using the quantized data,and that the discrepancy is also quantized,i.e.,quantization is done both outside and inside Numerica 10621 the LPC loop.If you are careful in following this prescription,then,apart from the 3s 43108 initial quantization of the data,you will not introduce even a single bit of roundoff (outside Recipes error into the compression-reconstruction process:While the evaluation of the sum in (13.6.1 1)may have roundoff errors,the residual that you store is the value which. when added back to the sum,gives exactly the original(quantized)data value.Notice North also that you do not need to massage the LP coefficients for stability;by adding the residual back in to each point,you never depart from the original data,so instabilities cannot grow.There is therefore no need for fixrts,above. Look at 820.4 to learn about Huffman coding,which will further compress the residuals by taking advantage of the fact that smaller values of discrepancy will occur more often than larger values.A very primitive version of Huffman coding would be this:If most of the discrepancies are in the range +127,but an occasional one is outside,then reserve the value 127 to mean"out of range,"and then record on the file (immediately following the 127)a full-word value of the out-of-range discrepancy. $20.4 explains how to do much better
13.6 Linear Prediction and Linear Predictive Coding 571 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Linear Predictive Coding (LPC) A different, though related, method to which the formalism above can be applied is the “compression” of a sampled signal so that it can be stored more compactly. The original form should be exactly recoverable from the compressed version. Obviously, compression can be accomplished only if there is redundancy in the signal. Equation (13.6.11) describes one kind of redundancy: It says that the signal, except for a small discrepancy, is predictable from its previous values and from a small number of LP coefficients. Compression of a signal by the use of (13.6.11) is thus called linear predictive coding, or LPC. The basic idea of LPC (in its simplest form) is to record as a compressed file (i) the number of LP coefficients M, (ii) their M values, e.g., as obtained by memcof, (iii) the first M data points, and then (iv) for each subsequent data point only its residual discrepancy xi (equation 13.6.1). When you are creating the compressed file, you find the residual by applying (13.6.1) to the previous M points, subtracting the sum from the actual value of the current point. When you are reconstructing the original file, you add the residual back in, at the point indicated in the routine predic. It may not be obvious why there is any compression at all in this scheme. After all, we are storing one value of residual per data point! Why not just store the original data point? The answer depends on the relative sizes of the numbers involved. The residual is obtained by subtracting two very nearly equal numbers (the data and the linear prediction). Therefore, the discrepancy typically has only a very small number of nonzero bits. These can be stored in a compressed file. How do you do it in a high-level language? Here is one way: Scale your data to have integer values, say between +1000000 and −1000000 (supposing that you need six significant figures). Modify equation (13.6.1) by enclosing the sum term in an “integer part of” operator. The discrepancy will now, by definition, be an integer. Experiment with different values of M, to find LP coefficients that make the range of the discrepancy as small as you can. If you can get to within a range of ±127 (and in our experience this is not at all difficult) then you can write it to a file as a single byte. This is a compression factor of 4, compared to 4-byte integer or floating formats. Notice that the LP coefficients are computed using the quantized data, and that the discrepancy is also quantized, i.e., quantization is done both outside and inside the LPC loop. If you are careful in following this prescription, then, apart from the initial quantization of the data, you will not introduce even a single bit of roundoff error into the compression-reconstruction process: While the evaluation of the sum in (13.6.11) may have roundoff errors, the residual that you store is the value which, when added back to the sum, gives exactly the original (quantized) data value. Notice also that you do not need to massage the LP coefficients for stability; by adding the residual back in to each point, you never depart from the original data, so instabilities cannot grow. There is therefore no need for fixrts, above. Look at §20.4 to learn about Huffman coding, which will further compress the residuals by taking advantage of the fact that smaller values of discrepancy will occur more often than larger values. A very primitive version of Huffman coding would be this: If most of the discrepancies are in the range ±127, but an occasional one is outside, then reserve the value 127 to mean “out of range,” and then record on the file (immediately following the 127) a full-word value of the out-of-range discrepancy. §20.4 explains how to do much better
572 Chapter 13.Fourier and Spectral Applications There are many variant procedures that all fall under the rubric of LPC If the spectral character of the data is time-variable,then it is best not to use a single set of LP coefficients for the whole data set,but rather to partition the data into segments,computing and storing different LP coefficients for each segment. If the data are really well characterized by their LP coefficients,and you can tolerate some small amount oferror,then don't bother storing all of the residuals.Just do linear prediction until you are outside of tolerances,then reinitialize(using M sequential stored residuals)and continue predicting. In some applications,most notably speech synthesis,one cares only about the spectral content of the reconstructed signal,not the relative phases. In this case,one need not store any starting values at all,but only the LP coefficients for each segment of the data.The output is reconstructed by driving these coefficients with initial conditions consisting of all zeros except for one nonzero spike.A speech synthesizer chip may have of order 10 LP coefficients,which change perhaps 20 to 50 times per second. Some people believe that it is interesting to analyze a signal by LPC,even when the residuals zi are not small.The xi's are then interpreted as the underlying"input signal"which,when filtered through the all-poles filter defined by the LP coefficients(see $13.7),produces the observed "output University Press. signal."LPC reveals simultaneously,it is said,the nature of the filter and the particular input that is driving it.We are skeptical of these applications, the literature,however,is full of extravagant claims. Programs CITED REFERENCES AND FURTHER READING: OF SCIENTIFIC Childers,D.G.(ed.)1978,Modem Spectrum Analysis(New York:IEEE Press),especially the 、学 paper by J.Makhoul (reprinted from Proceedings of the IEEE,vol.63,p.561,1975). Burg,J.P.1968.reprinted in Childers,1978.[1] Anderson,N.1974,reprinted in Childers,1978.[2] Cressie.N.1991,in Spatia/Statistics and Digital Image Analysis(Washington:National Academy Press).[3] Press,W.H.,and Rybicki,G.B.1992,Astrophysica/Journal,vol.398,pp.169-176.[4] Numerical Recipes 10621 43106 13.7 Power Spectrum Estimation by the (outside Maximum Entropy (All Poles)Method Software. The FFT is not the only way to estimate the power spectrum of a process,nor is it necessarily the best way for all purposes.To see how one might devise another method, let us enlarge our view for a moment,so that it includes not only real frequencies in the Nyquist interval-fe<f<fe,but also the entire complex frequency plane.From that vantage point,let us transform the complex f-plane to a new plane,called the z-transform plane or z-plane,by the relation 2三e2mfA (13.7.1) where A is,as usual,the sampling interval in the time domain.Notice that the Nyquist interval on the real axis of the f-plane maps one-to-one onto the unit circle in the complex z-plane
572 Chapter 13. Fourier and Spectral Applications Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). There are many variant procedures that all fall under the rubric of LPC. • If the spectral character of the data is time-variable, then it is best not to use a single set of LP coefficients for the whole data set, but rather to partition the data into segments, computing and storing different LP coefficients for each segment. • If the data are really well characterized by their LP coefficients, and you can tolerate some small amount of error, then don’t bother storing all of the residuals. Just do linear prediction until you are outside of tolerances, then reinitialize (using M sequential stored residuals) and continue predicting. • In some applications, most notably speech synthesis, one cares only about the spectral content of the reconstructed signal, not the relative phases. In this case, one need not store any starting values at all, but only the LP coefficients for each segment of the data. The output is reconstructed by driving these coefficients with initial conditions consisting of all zeros except for one nonzero spike. A speech synthesizer chip may have of order 10 LP coefficients, which change perhaps 20 to 50 times per second. • Some people believe that it is interesting to analyze a signal by LPC, even when the residuals xi are not small. The xi’s are then interpreted as the underlying “input signal” which, when filtered through the all-poles filter defined by the LP coefficients (see §13.7), produces the observed “output signal.” LPC reveals simultaneously, it is said, the nature of the filter and the particular input that is driving it. We are skeptical of these applications; the literature, however, is full of extravagant claims. CITED REFERENCES AND FURTHER READING: Childers, D.G. (ed.) 1978, Modern Spectrum Analysis (New York: IEEE Press), especially the paper by J. Makhoul (reprinted from Proceedings of the IEEE, vol. 63, p. 561, 1975). Burg, J.P. 1968, reprinted in Childers, 1978. [1] Anderson, N. 1974, reprinted in Childers, 1978. [2] Cressie, N. 1991, in Spatial Statistics and Digital Image Analysis (Washington: National Academy Press). [3] Press, W.H., and Rybicki, G.B. 1992, Astrophysical Journal, vol. 398, pp. 169–176. [4] 13.7 Power Spectrum Estimation by the Maximum Entropy (All Poles) Method The FFT is not the only way to estimate the power spectrum of a process, nor is it necessarily the best way for all purposes. To see how one might devise another method, let us enlarge our view for a moment, so that it includes not only real frequencies in the Nyquist interval −fc <f<fc, but also the entire complex frequency plane. From that vantage point, let us transform the complex f-plane to a new plane, called the z-transform plane or z-plane, by the relation z ≡ e2πif∆ (13.7.1) where ∆ is, as usual, the sampling interval in the time domain. Notice that the Nyquist interval on the real axis of the f-plane maps one-to-one onto the unit circle in the complex z-plane