《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 15.7

团购合买资源类别：文库，文档格式：PDF，文档页数：8，文件大小：79.76KB

15.7 Robust Estimation 699 M VitVki (15.6.10) CITED REFERENCES AND FURTHER READING: Efron,B.1982,The Jackknife,the Bootstrap,and Other Resampling Plans(Philadelphia:S.I.A.M.). [1] Efron,B..and Tibshirani,R.1986.Statistica/Science vol.1,pp.54-77.[2] Avni,Y.1976,Astrophysical Journal,vol.210,pp.642-646.[3] Lampton,M.,Margon,M.,and Bowyer,S.1976,Astrophysical Journal,vol.208,pp.177-190. Brownlee,K.A.1965,Statistical Theory and Methodology,2nd ed.(New York:Wiley). Martin,B.R.1971,Statistics for Physicists(New York:Academic Press) 15.7 Robust Estimation 、gad的令 The concept of robustness has been mentioned in passing several times already. Press. In $14.I we noted that the median was a more robust estimator of central value than the mean:in 814.6 it was mentioned that rank correlation is more robust than linear correlation.The concept of outlier points as exceptions to a Gaussian model for experimental error was discussed in 815.1. The term "robust"was coined in statistics by G.E.P.Box in 1953.Various OF SCIENTIFIC definitions of greater or lesser mathematical rigor are possible for the term,but in general,referring to a statistical estimator,it means"insensitive to small departures from the idealized assumptions for which the estimator is optimized."[1,2]The word "small"can have two different interpretations,both important:either fractionally small departures for all data points,or else fractionally large departures for a small number of data points.It is the latter interpretation,leading to the notion of outlier points,that is generally the most stressful for statistical procedures. Numerica 10621 Statisticians have developed various sorts of robust statistical estimators.Many, if not most,can be grouped in one of three categories. 431 M-estimates follow from maximum-likelihood arguments very much as equa- Recipes tions(15.1.5)and (15.1.7)followed from equation(15.1.3).M-estimates are usually the most relevant class for model-fitting,that is,estimation of parameters.We (outside 腿 therefore consider these estimates in some detail below. North L-estimates are "linear combinations of order statistics."These are most applicable to estimations of central value and central tendency,though they can occasionally be applied to some problems in estimation of parameters.Two "typical"L-estimates will give you the general idea.They are(i)the median,and (ii)Tukey's trimean.defined as the weighted average of the first,second,and third quartile points in a distribution,with weights 1/4,1/2,and 1/4,respectively. R-estimates are estimates based on rank tests.For example,the equality or inequality of two distributions can be estimated by the Wilcoxon test of computing the mean rank of one distribution in a combined sample of both distributions. The Kolmogorov-Smirnov statistic (equation 14.3.6)and the Spearman rank-order

15.7 Robust Estimation 699 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Cjk = M i=1 1 w2 i VjiVki (15.6.10) CITED REFERENCES AND FURTHER READING: Efron, B. 1982, The Jackknife, the Bootstrap, and Other Resampling Plans (Philadelphia: S.I.A.M.). [1] Efron, B., and Tibshirani, R. 1986, Statistical Science vol. 1, pp. 54–77. [2] Avni, Y. 1976, Astrophysical Journal, vol. 210, pp. 642–646. [3] Lampton, M., Margon, M., and Bowyer, S. 1976, Astrophysical Journal, vol. 208, pp. 177–190. Brownlee, K.A. 1965, Statistical Theory and Methodology, 2nd ed. (New York: Wiley). Martin, B.R. 1971, Statistics for Physicists (New York: Academic Press). 15.7 Robust Estimation The concept of robustness has been mentioned in passing several times already. In §14.1 we noted that the median was a more robust estimator of central value than the mean; in §14.6 it was mentioned that rank correlation is more robust than linear correlation. The concept of outlier points as exceptions to a Gaussian model for experimental error was discussed in §15.1. The term “robust” was coined in statistics by G.E.P. Box in 1953. Various definitions of greater or lesser mathematical rigor are possible for the term, but in general, referring to a statistical estimator, it means “insensitive to small departures from the idealized assumptions for which the estimator is optimized.” [1,2] The word “small” can have two different interpretations, both important: either fractionally small departures for all data points, or else fractionally large departures for a small number of data points. It is the latter interpretation, leading to the notion of outlier points, that is generally the most stressful for statistical procedures. Statisticians have developed various sorts of robust statistical estimators. Many, if not most, can be grouped in one of three categories. M-estimates follow from maximum-likelihood arguments very much as equations (15.1.5) and (15.1.7) followed from equation (15.1.3). M-estimates are usually the most relevant class for model-fitting, that is, estimation of parameters. We therefore consider these estimates in some detail below. L-estimates are “linear combinations of order statistics.” These are most applicable to estimations of central value and central tendency, though they can occasionally be applied to some problems in estimation of parameters. Two “typical” L-estimates will give you the general idea. They are (i) the median, and (ii) Tukey’s trimean, defined as the weighted average of the first, second, and third quartile points in a distribution, with weights 1/4, 1/2, and 1/4, respectively. R-estimates are estimates based on rank tests. For example, the equality or inequality of two distributions can be estimated by the Wilcoxon test of computing the mean rank of one distribution in a combined sample of both distributions. The Kolmogorov-Smirnov statistic (equation 14.3.6) and the Spearman rank-order

15.7 Robust Estimation 701 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). where the function ρ is the negative logarithm of the probability density. Taking the logarithm of (15.7.1) analogously with (15.1.4), we find that we want to minimize the expression N i=1 ρ(yi, y {xi; a}) (15.7.2) Very often, it is the case that the function ρ depends not independently on its two arguments, measured yi and predicted y(xi), but only on their difference, at least if scaled by some weight factors σi which we are able to assign to each point. In this case the M-estimate is said to be local, and we can replace (15.7.2) by the prescription minimize over a N i=1 ρ yi − y(xi; a) σi (15.7.3) where the function ρ(z) is a function of a single variable z ≡ [yi − y(xi)]/σi. If we now define the derivative of ρ(z) to be a function ψ(z), ψ(z) ≡ dρ(z) dz (15.7.4) then the generalization of (15.1.7) to the case of a general M-estimate is 0 = N i=1 1 σi ψ yi − y(xi) σi ∂y(xi; a) ∂ak k = 1,...,M (15.7.5) If you compare (15.7.3) to (15.1.3), and (15.7.5) to (15.1.7), you see at once that the specialization for normally distributed errors is ρ(z) = 1 2 z2 ψ(z) = z (normal) (15.7.6) If the errors are distributed as a double or two-sided exponential, namely Prob {yi − y(xi)} ∼ exp − yi − y(xi) σi (15.7.7) then, by contrast, ρ(x) = |z| ψ(z) = sgn(z) (double exponential) (15.7.8) Comparing to equation (15.7.3), we see that in this case the maximum likelihood estimator is obtained by minimizing the mean absolute deviation, rather than the mean square deviation. Here the tails of the distribution, although exponentially decreasing, are asymptotically much larger than any corresponding Gaussian. A distribution with even more extensive — therefore sometimes even more realistic — tails is the Cauchy or Lorentzian distribution, Prob {yi − y(xi)} ∼ 1 1 + 1 2 yi − y(xi) σi 2 (15.7.9)

702 Chapter 15.Modeling of Data This implies p(z）=log (1+ 2 (z)= 1+52 (Lorentzian) (15.7.10) Notice that the function occurs as a weighting function in the generalized normal equations (15.7.5).For normally distributed errors,equation (15.7.6)says that the more deviant the points,the greater the weight.By contrast,when tails are somewhat more prominent,as in(15.7.7),then(15.7.8)says that all deviant points get the same relative weight,with only the sign information used.Finally,when the tails are even larger,(15.7.10)says the increases with deviation,then starts decreasing,so that very deviant points-the true outliers-are not counted at all in the estimation of the parameters. This general idea,that the weight given individual points should first increase 8 鱼君 with deviation,then decrease,motivates some additional prescriptions for which do not especially correspond to standard,textbook probability distributions.Two examples are RECIPES Andrew's sine 9 (z) sin(z/c) 2c (15.7.12) 6 where the optimal value of c for normal errors is c =6.0. Numerical Calculation of M-Estimates (ISBN To fit a model by means of an M-estimate,you first decide which M-estimate 10.621 you want,that is,which matching pair p,you want to use.We rather like Recipes Numerica (15.7.8)or(15.7.10). 431 You then have to make an unpleasant choice between two fairly difficult E Recipes problems.Either find the solution of the nonlinear set of M equations(15.7.5),or else minimize the single function in M variables(15.7.3). (outside Notice that the function (15.7.8)has a discontinuous and a discontinuous North derivative for p.Such discontinuities frequently wreak havoc on both general nonlinear equation solvers and general function minimizing routines.You might now think of rejecting(15.7.8)in favor of (15.7.10),which is smoother.However, you will find that the latter choice is also bad news for many general equation solving or minimization routines:small changes in the fitted parameters can drive (z) off its peak into one or the other of its asymptotically small regimes.Therefore, different terms in the equation spring into or out of action(almost as bad as analytic discontinuities). Don't despair.If your computer budget (or,for personal computers,patience) is up to it,this is an excellent application for the downhill simplex minimization

702 Chapter 15. Modeling of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). This implies ρ(z) = log 1 + 1 2 z2 ψ(z) = z 1 + 1 2 z2 (Lorentzian) (15.7.10) Notice that the ψ function occurs as a weighting function in the generalized normal equations (15.7.5). For normally distributed errors, equation (15.7.6) says that the more deviant the points, the greater the weight. By contrast, when tails are somewhat more prominent, as in (15.7.7), then (15.7.8) says that all deviant points get the same relative weight, with only the sign information used. Finally, when the tails are even larger, (15.7.10) says the ψ increases with deviation, then starts decreasing, so that very deviant points — the true outliers — are not counted at all in the estimation of the parameters. This general idea, that the weight given individual points should first increase with deviation, then decrease, motivates some additional prescriptions for ψ which do not especially correspond to standard, textbook probability distributions. Two examples are Andrew’s sine ψ(z) = sin(z/c) 0 |z| cπ (15.7.11) If the measurement errors happen to be normal after all, with standard deviations σ i, then it can be shown that the optimal value for the constant c is c = 2.1. Tukey’s biweight ψ(z) = z(1 − z2/c2)2 0 |z| c (15.7.12) where the optimal value of c for normal errors is c = 6.0. Numerical Calculation of M-Estimates To fit a model by means of an M-estimate, you first decide which M-estimate you want, that is, which matching pair ρ, ψ you want to use. We rather like (15.7.8) or (15.7.10). You then have to make an unpleasant choice between two fairly difficult problems. Either find the solution of the nonlinear set of M equations (15.7.5), or else minimize the single function in M variables (15.7.3). Notice that the function (15.7.8) has a discontinuous ψ, and a discontinuous derivative for ρ. Such discontinuities frequently wreak havoc on both general nonlinear equation solvers and general function minimizing routines. You might now think of rejecting (15.7.8) in favor of (15.7.10), which is smoother. However, you will find that the latter choice is also bad news for many general equation solving or minimization routines: small changes in the fitted parameters can drive ψ(z) off its peak into one or the other of its asymptotically small regimes. Therefore, different terms in the equation spring into or out of action (almost as bad as analytic discontinuities). Don’t despair. If your computer budget (or, for personal computers, patience) is up to it, this is an excellent application for the downhill simplex minimization

15.7 Robust Estimation 703 algorithm exemplified in amoeba 810.4 or amebsa in $10.9.Those algorithms make no assumptions about continuity;they just ooze downhill and will work for virtually any sane choice of the function p. It is very much to your(financial)advantage to find good starting values, however.Often this is done by first fitting the model by the standard x2(nonrobust) techniques,e.g.,as described in $15.4 or $15.5.The fitted parameters thus obtained are then used as starting values in amoeba,now using the robust choice of p and minimizing the expression(15.7.3). Fitting a Line by Minimizing Absolute Deviation Occasionally there is a special case that happens to be much easier than is suggested by the general strategy outlined above.The case of equations(15.7.7)- (15.7.8),when the model is a simple straight line y(x;a,b)=a+bx (15.7.13) and where the weights o:are all equal,happens to be such a case.The problem is precisely the robust version of the problem posed in equation(15.2.1)above,namely 百ò fit a straight line through a set of data points.The merit function to be minimized is 9 a bzil (15.7.14) =1 rather than the x2 given by equation(15.2.2). The key simplification is based on the following fact:The median cM of a set of numbers ci is also that value which minimizes the sum of the absolute deviations ∑Ia-cl 6 (Proof:Differentiate the above expression with respect to cM and set it to zero.) It follows that,for fixed b,the value of a that minimizes (15.7.14)is ▣ a median [yi-bxi} (15.7.15) Equation(15.7.5)for the parameter b is Numerica 10.621 43126 0 ti sgn(yi-a-bxi) (15.7.16) i=1 (outside (where sgn(0)is to be interpreted as zero).If we replace a in this equation by the implied function a(b)of(15.7.15),then we are left with an equation in a single variable which can be solved by bracketing and bisection,as described in 89.1. (In fact,it is dangerous to use any fancier method of root-finding,because of the discontinuities in equation 15.7.16.) Here is a routine that does all this.It calls select(88.5)to find the median. The bracketing and bisection are built in to the routine,as is the x2 solution that generates the initial guesses for a and b.Notice that the evaluation of the right-hand side of(15.7.16)occurs in the function rofunc,with communication via global (top-level)variables

15.7 Robust Estimation 703 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). algorithm exemplified in amoeba §10.4 or amebsa in §10.9. Those algorithms make no assumptions about continuity; they just ooze downhill and will work for virtually any sane choice of the function ρ. It is very much to your (financial) advantage to find good starting values, however. Often this is done by first fitting the model by the standard χ2 (nonrobust) techniques, e.g., as described in §15.4 or §15.5. The fitted parameters thus obtained are then used as starting values in amoeba, now using the robust choice of ρ and minimizing the expression (15.7.3). Fitting a Line by Minimizing Absolute Deviation Occasionally there is a special case that happens to be much easier than is suggested by the general strategy outlined above. The case of equations (15.7.7)– (15.7.8), when the model is a simple straight line y(x; a, b) = a + bx (15.7.13) and where the weights σi are all equal, happens to be such a case. The problem is precisely the robust version of the problem posed in equation (15.2.1) above, namely fit a straight line through a set of data points. The merit function to be minimized is N i=1 |yi − a − bxi| (15.7.14) rather than the χ2 given by equation (15.2.2). The key simplification is based on the following fact: The median c M of a set of numbers ci is also that value which minimizes the sum of the absolute deviations i |ci − cM | (Proof: Differentiate the above expression with respect to cM and set it to zero.) It follows that, for fixed b, the value of a that minimizes (15.7.14) is a = median {yi − bxi} (15.7.15) Equation (15.7.5) for the parameter b is 0 = N i=1 xi sgn(yi − a − bxi) (15.7.16) (where sgn(0) is to be interpreted as zero). If we replace a in this equation by the implied function a(b) of (15.7.15), then we are left with an equation in a single variable which can be solved by bracketing and bisection, as described in §9.1. (In fact, it is dangerous to use any fancier method of root-finding, because of the discontinuities in equation 15.7.16.) Here is a routine that does all this. It calls select (§8.5) to find the median. The bracketing and bisection are built in to the routine, as is the χ2 solution that generates the initial guesses for a and b. Notice that the evaluation of the right-hand side of (15.7.16) occurs in the function rofunc, with communication via global (top-level) variables

15.7 Robust Estimation 705 *a=aa; *b=bb; *abdev=abdevt/ndata; #include #include "nrutil.h" #define EPS 1.0e-7 extern int ndatat; Defined in medfit. extern float *xt,*yt,aa,abdevt; http://www.nr. float rofunc(float b) Evaluates the right-hand side of equation (15.7.16)for a given value of b.Communication with 83g the routine medfit is through global variables. float select(unsigned long k,unsigned long n,float arr[]); int j; 1.800 float *arr,d,sum=0.0; arr=vector(1,ndatat); to any from NUMERICAL RECIPESI 19881992 for (j=1;j>1,ndatat,arr); else server computer, (North America j=ndatat >>1; make one paper University Press. THE aa=0.5*(select(i,ndatat,arr)+select(j+1,ndatat,arr)); ART d 是 abdevt=0.0; Programs for (j=1;jEPS)sum +=(d >=0.0 xt[j]-xt[j]) to dir Copyright (C) free_vector(arr,1,ndatat); return sum; ectcustser 18881920 OF SCIENTIFIC COMPUTING(ISBN Other Robust Techniques v@cambr 10-621 Sometimes you may have a priori knowledge about the probable values and Further reproduction. Numerical Recipes 43108 probable uncertainties of some parameters that you are trying to estimate from a data set.In such cases you may want to perform a fit that takes this advance information properly into account,neither completely freezing a parameter at a predetermined (outside value (as in lfit $15.4)nor completely leaving it to be determined by the data set North Software. The formalism for doing this is called "use of a priori covariances." A related problem occurs in signal processing and control theory,where it is sometimes desired to"track"(i.e.,maintain an estimate of)a time-varying signal in visit website the presence of noise.If the signal is known to be characterized by some number machine of parameters that vary only slowly,then the formalism of Kalman filtering tells how the incoming,raw measurements of the signal should be processed to produce best parameter estimates as a function of time.For example,if the signal is a frequency-modulated sine wave,then the slowly varying parameter might be the instantaneous frequency.The Kalman filter for this case is called a phase-locked loop and is implemented in the circuitry of good radio receivers [3.41

15.7 Robust Estimation 705 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). *a=aa; *b=bb; *abdev=abdevt/ndata; } #include #include "nrutil.h" #define EPS 1.0e-7 extern int ndatat; Defined in medfit. extern float *xt,*yt,aa,abdevt; float rofunc(float b) Evaluates the right-hand side of equation (15.7.16) for a given value of b. Communication with the routine medfit is through global variables. { float select(unsigned long k, unsigned long n, float arr[]); int j; float *arr,d,sum=0.0; arr=vector(1,ndatat); for (j=1;j>1,ndatat,arr); } else { j=ndatat >> 1; aa=0.5*(select(j,ndatat,arr)+select(j+1,ndatat,arr)); } abdevt=0.0; for (j=1;j EPS) sum += (d >= 0.0 ? xt[j] : -xt[j]); } free_vector(arr,1,ndatat); return sum; } Other Robust Techniques Sometimes you may have a priori knowledge about the probable values and probable uncertainties of some parameters that you are trying to estimate from a data set. In such cases you may want to perform a fit that takes this advance information properly into account, neither completely freezing a parameter at a predetermined value (as in lfit §15.4) nor completely leaving it to be determined by the data set. The formalism for doing this is called “use of a priori covariances.” A related problem occurs in signal processing and control theory, where it is sometimes desired to “track” (i.e., maintain an estimate of) a time-varying signal in the presence of noise. If the signal is known to be characterized by some number of parameters that vary only slowly, then the formalism of Kalman filtering tells how the incoming, raw measurements of the signal should be processed to produce best parameter estimates as a function of time. For example, if the signal is a frequency-modulated sine wave, then the slowly varying parameter might be the instantaneous frequency. The Kalman filter for this case is called a phase-locked loop and is implemented in the circuitry of good radio receivers [3,4]

点击进入文档下载页（PDF格式）

已到末页，全文结束

点击下载（PDF格式）

浏览记录