15.5 Nonlinear Models 681 Lawson,C.L.,and Hanson,R.1974,So/ving Least Squares Problems (Englewood Cliffs,NJ: Prentice-Hall). Forsythe,G.E.,Malcolm,M.A.,and Moler,C.B.1977,Computer Methods for Mathematical Computations (Englewood Cliffs,NJ:Prentice-Hall),Chapter 9. 15.5 Nonlinear Models We now consider fitting when the model depends nonlinearly on the set of M unknown parameters ak,k=1,2,...,M.We use the same approach as in previous sections,namely to define a x-merit function and determine best-fit parameters by its minimization.With nonlinear dependences,however,the minimization must 、 proceed iteratively.Given trial values for the parameters,we develop a procedure that improves the trial solution.The procedure is then repeated until x2 stops(or effectively stops)decreasing. How is this problem different from the general nonlinear function minimization problem already dealt with in Chapter 10?Superficially,not at all:Sufficiently a 9 close to the minimum,we expect the x2 function to be well approximated by a quadratic form,which we can write as 1 X2(a)≈y-da+2aD.a (15.5.1) 星 9 where d is an M-vector and D is an M x M matrix.(Compare equation 10.6.1.) If the approximation is a good one,we know how to jump from the current trial parameters acur to the minimizing ones amin in a single leap,namely amin acur +D-1.[-Vx2(acur)] (15.5.2) (Compare equation 10.7.4.) On the other hand,(15.5.1)might be a poor local approximation to the shape 10.621 of the function that we are trying to minimize at acur.In that case,about all we Numerica can do is take a step down the gradient,as in the steepest descent method(810.6). 431 In other words. Recipes anext acur -constant x Vx2(acur) (15.5.3) North where the constant is small enough not to exhaust the downhill direction. To use (15.5.2)or (15.5.3),we must be able to compute the gradient of the x2 function at any set of parameters a.To use(15.5.2)we also need the matrix D,which is the second derivative matrix(Hessian matrix)of the x2 merit function,at any a. Now,this is the crucial difference from Chapter 10:There,we had no way of directly evaluating the Hessian matrix.We were given only the ability to evaluate the function to be minimized and(in some cases)its gradient.Therefore,we had to resort to iterative methods not just because our function was nonlinear,but also in order to build up information about the Hessian matrix.Sections 10.7 and 10.6 concerned themselves with two different techniques for building up this information
15.5 Nonlinear Models 681 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Lawson, C.L., and Hanson, R. 1974, Solving Least Squares Problems (Englewood Cliffs, NJ: Prentice-Hall). Forsythe, G.E., Malcolm, M.A., and Moler, C.B. 1977, Computer Methods for Mathematical Computations (Englewood Cliffs, NJ: Prentice-Hall), Chapter 9. 15.5 Nonlinear Models We now consider fitting when the model depends nonlinearly on the set of M unknown parameters ak, k = 1, 2,...,M. We use the same approach as in previous sections, namely to define a χ2 merit function and determine best-fit parameters by its minimization. With nonlinear dependences, however, the minimization must proceed iteratively. Given trial values for the parameters, we develop a procedure that improves the trial solution. The procedure is then repeated until χ2 stops (or effectively stops) decreasing. How is this problem different from the general nonlinear function minimization problem already dealt with in Chapter 10? Superficially, not at all: Sufficiently close to the minimum, we expect the χ2 function to be well approximated by a quadratic form, which we can write as χ2(a) ≈ γ − d · a + 1 2 a · D · a (15.5.1) where d is an M-vector and D is an M × M matrix. (Compare equation 10.6.1.) If the approximation is a good one, we know how to jump from the current trial parameters acur to the minimizing ones amin in a single leap, namely amin = acur + D−1 · −∇χ2(acur) (15.5.2) (Compare equation 10.7.4.) On the other hand, (15.5.1) might be a poor local approximation to the shape of the function that we are trying to minimize at a cur. In that case, about all we can do is take a step down the gradient, as in the steepest descent method (§10.6). In other words, anext = acur − constant × ∇χ2(acur) (15.5.3) where the constant is small enough not to exhaust the downhill direction. To use (15.5.2) or (15.5.3), we must be able to compute the gradient of the χ 2 function at any set of parameters a. To use (15.5.2) we also need the matrix D, which is the second derivative matrix (Hessian matrix) of the χ2 merit function, at any a. Now, this is the crucial difference from Chapter 10: There, we had no way of directly evaluating the Hessian matrix. We were given only the ability to evaluate the function to be minimized and (in some cases) its gradient. Therefore, we had to resort to iterative methods not just because our function was nonlinear, but also in order to build up information about the Hessian matrix. Sections 10.7 and 10.6 concerned themselves with two different techniques for building up this information.
682 Chapter 15.Modeling of Data Here,life is much simpler.We know exactly the form of x2,since it is based on a model function that we ourselves have specified.Therefore the Hessian matrix is known to us.Thus we are free to use (15.5.2)whenever we care to do so.The only reason to use (15.5.3)will be failure of (15.5.2)to improve the fit,signaling failure of(15.5.1)as a good local approximation. Calculation of the Gradient and Hessian The model to be fitted is y=y(x;a) (15.5.4) and the x2 merit function is 人 x(a (15.5.5) i=1 The gradient of x2 with respect to the parameters a,which will be zero at the x2 minimum,has components z今 0x2 =2ya)]ou(:;a) 0 Oak k=1,2,..,M (15.5.6 =1 A三是20 Taking an additional partial derivative gives IENTIFIC N 「0yc:r-:-yr:a 6 一=2〉,5 82y(xi;a) (15.5.7) 0ak∂a1 dak a1 Oadak 1992 MPUTING (ISBN It is conventional to remove the factors of 2 by defining Numerica 10621 10x2 102x2 6k三一 2 0ak Qkl三 (15.5.8) 2 0akda Recipes 43108 making [a]=D in equation(15.5.2),in terms of which that equation can be (outside rewritten as the set of linear equations North M akI 6a1=Bk (15.5.9) 1 This set is solved for the increments at that,added to the current approximation, give the next approximation.In the context of least-squares,the matrix [a],equal to one-half times the Hessian matrix,is usually called the curvature matrix Equation (15.5.3).the steepest descent formula,translates to oa constant x Br (15.5.10)
682 Chapter 15. Modeling of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Here, life is much simpler. We know exactly the form of χ2, since it is based on a model function that we ourselves have specified. Therefore the Hessian matrix is known to us. Thus we are free to use (15.5.2) whenever we care to do so. The only reason to use (15.5.3) will be failure of (15.5.2) to improve the fit, signaling failure of (15.5.1) as a good local approximation. Calculation of the Gradient and Hessian The model to be fitted is y = y(x; a) (15.5.4) and the χ2 merit function is χ2(a) = N i=1 yi − y(xi; a) σi 2 (15.5.5) The gradient of χ2 with respect to the parameters a, which will be zero at the χ2 minimum, has components ∂χ2 ∂ak = −2 N i=1 [yi − y(xi; a)] σ2 i ∂y(xi; a) ∂ak k = 1, 2,...,M (15.5.6) Taking an additional partial derivative gives ∂2χ2 ∂ak∂al = 2 N i=1 1 σ2 i ∂y(xi; a) ∂ak ∂y(xi; a) ∂al − [yi − y(xi; a)]∂2y(xi; a) ∂al∂ak (15.5.7) It is conventional to remove the factors of 2 by defining βk ≡ −1 2 ∂χ2 ∂ak αkl ≡ 1 2 ∂2χ2 ∂ak∂al (15.5.8) making [α] = 1 2D in equation (15.5.2), in terms of which that equation can be rewritten as the set of linear equations M l=1 αkl δal = βk (15.5.9) This set is solved for the increments δal that, added to the current approximation, give the next approximation. In the context of least-squares, the matrix [α], equal to one-half times the Hessian matrix, is usually called the curvature matrix. Equation (15.5.3), the steepest descent formula, translates to δal = constant × βl (15.5.10)
15.5 Nonlinear Models 683 Note that the components a of the Hessian matrix (15.5.7)depend both on the first derivatives and on the second derivatives of the basis functions with respect to their parameters.Some treatments proceed to ignore the second derivative without comment.We will ignore it also,but only afier a few comments. Second derivatives occur because the gradient(15.5.6)already has a dependence on oy/Oak,so the next derivative simply must contain terms involving a2y/0aoak. The second derivative term can be dismissed when it is zero (as in the linear case of equation 15.4.8),or small enough to be negligible when compared to the term involving the first derivative.It also has an additional possibility of being ignorably small in practice:The term multiplying the second derivative in equation(15.5.7) 81 is [yi-y(zi;a)].For a successful model,this term should just be the random measurement error of each point.This error can have either sign,and should in general be uncorrelated with the model.Therefore,the second derivative terms tend to cancel out when summed over i. Inclusion of the second-derivative term can in fact be destabilizing if the model fits badly or is contaminated by outlier points that are unlikely to be offset by compensating points of opposite sign.From this point on,we will always use as RECIPES the definition of ok the formula 9 N 1 ay(x;a)0y(x;a) 15.5.11) i=1 Dak da 孕足 9 This expression more closely resembles its linear cousin (15.4.8).You should 9 understand that minor (or even major)fiddling with [a]has no effect at all on what final set of parameters a is reached.but affects only the iterative route that is taken in getting there.The condition at the x2 minimum,that B&=0 for allk, 6 is independent of how a is defined. Levenberg-Marquardt Method Marquardt [1]has put forth an elegant method,related to an earlier suggestion of Levenberg.for varying smoothly between the extremes of the inverse-Hessian method Recipes Numerica 10621 (15.5.9)and the steepest descent method(15.5.10).The latter method is used far from the minimum,switching continuously to the former as the minimum is approached. 43106 This Levenberg-Marquardt method (also called Marquardt method)works very well Recipes in practice and has become the standard of nonlinear least-squares routines. The method is based on two elementary,but important,insights.Consider the "constant"in equation (15.5.10).What should it be,even in order of magnitude? North What sets its scale?There is no information about the answer in the gradient. That tells only the slope,not how far that slope extends.Marquardt's first insight is that the components of the Hessian matrix,even if they are not usable in any precise fashion,give some information about the order-of-magnitude scale of the problem. The quantity x2 is nondimensional,i.e.,is a pure number;this is evident from its definition(15.5.5).On the other hand,B&has the dimensions of 1/ak,which may well be dimensional,i.e.,have units like cm-1,or kilowatt-hours,or whatever. (In fact,each component of B can have different dimensions!)The constant of proportionality between Bx and oak must therefore have the dimensions of a.Scan
15.5 Nonlinear Models 683 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Note that the components αkl of the Hessian matrix (15.5.7) depend both on the first derivatives and on the second derivatives of the basis functions with respect to their parameters. Some treatments proceed to ignore the second derivative without comment. We will ignore it also, but only after a few comments. Second derivatives occur because the gradient (15.5.6) already has a dependence on ∂y/∂ak, so the next derivative simply must contain terms involving ∂ 2y/∂al∂ak. The second derivative term can be dismissed when it is zero (as in the linear case of equation 15.4.8), or small enough to be negligible when compared to the term involving the first derivative. It also has an additional possibility of being ignorably small in practice: The term multiplying the second derivative in equation (15.5.7) is [yi − y(xi; a)]. For a successful model, this term should just be the random measurement error of each point. This error can have either sign, and should in general be uncorrelated with the model. Therefore, the second derivative terms tend to cancel out when summed over i. Inclusion of the second-derivative term can in fact be destabilizing if the model fits badly or is contaminated by outlier points that are unlikely to be offset by compensating points of opposite sign. From this point on, we will always use as the definition of αkl the formula αkl = N i=1 1 σ2 i ∂y(xi; a) ∂ak ∂y(xi; a) ∂al (15.5.11) This expression more closely resembles its linear cousin (15.4.8). You should understand that minor (or even major) fiddling with [α] has no effect at all on what final set of parameters a is reached, but affects only the iterative route that is taken in getting there. The condition at the χ2 minimum, that βk = 0 for all k, is independent of how [α] is defined. Levenberg-Marquardt Method Marquardt [1] has put forth an elegant method, related to an earlier suggestion of Levenberg, for varying smoothly between the extremes of the inverse-Hessian method (15.5.9) and the steepest descent method (15.5.10). The latter method is used far from the minimum, switching continuously to the former as the minimum is approached. This Levenberg-Marquardt method (also called Marquardt method) works very well in practice and has become the standard of nonlinear least-squares routines. The method is based on two elementary, but important, insights. Consider the “constant” in equation (15.5.10). What should it be, even in order of magnitude? What sets its scale? There is no information about the answer in the gradient. That tells only the slope, not how far that slope extends. Marquardt’s first insight is that the components of the Hessian matrix, even if they are not usable in any precise fashion, give some information about the order-of-magnitude scale of the problem. The quantity χ2 is nondimensional, i.e., is a pure number; this is evident from its definition (15.5.5). On the other hand, β k has the dimensions of 1/ak, which may well be dimensional, i.e., have units like cm−1, or kilowatt-hours, or whatever. (In fact, each component of βk can have different dimensions!) The constant of proportionality between βk and δak must therefore have the dimensions of a2 k. Scan
684 Chapter 15.Modeling of Data the components of [and you see that there is only one obvious quantity with these dimensions,and that is 1/k,the reciprocal of the diagonal element.So that must set the scale of the constant.But that scale might itself be too big.So let's divide the constant by some(nondimensional)fudge factor A,with the possibility of setting 1 to cut down the step.In other words,replace equation(15.5.10)by 1 6a1= or入al6al=f (15.5.12) 入au It is necessary that ou be positive,but this is guaranteed by definition (15.5.11)- 81 another reason for adopting that equation. Marquardt's second insight is that equations (15.5.12)and (15.5.9)can be combined if we define a new matrix a'by the following prescription 猴 a方三a(1+) (15.5.13) ajk三ajk (G卡k) RECIPES %2 and then replace both (15.5.12)and (15.5.9)by 9 M =Bk 15.5.14) When A is very large,the matrix a'is forced into being diagonally dominant,so 里00 equation (15.5.14)goes over to be identical to (15.5.12).On the other hand,as A approaches zero,equation (15.5.14)goes over to (15.5.9). Given an initial guess for the set of fitted parameters a,the recommended Marquardt recipe is as follows: ·Compute x2(a). ●Pick a modest value for入,say入=0.00l .(Solve the linear equations(15.5.14)for 6a and evaluate x2(a +6a). ·IfX2(a+6a)≥X2(a),increase入by a factor of 10(or any other 、0入0会 Numerica 10.621 substantial factor)and go back to () If x2(a+6a)<x2(a),decrease A by a factor of 10,update the trial 43106 solution a -a +6a,and go back to () Recipes Also necessary is a condition for stopping.Iterating to convergence(to machine (outside accuracy or to the roundoff limit)is generally wasteful and unnecessary since the minimum is at best only a statistical estimate of the parameters a.As we will see North in $15.6,a change in the parameters that changes x2 by an amount1 is never statistically meaningful. Furthermore,it is not uncommon to find the parameters wandering around near the minimum in a flat valley of complicated topography.The rea- son is that Marquardt's method generalizes the method of normal equations(815.4), hence has the same problem as that method with regard to near-degeneracy of the minimum.Outright failure by a zero pivot is possible,but unlikely.More often, a small pivot will generate a large correction which is then rejected,the value of A being then increased.For sufficiently large A the matrix [a]is positive definite and can have no small pivots.Thus the method does tend to stay away from zero
684 Chapter 15. Modeling of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). the components of [α] and you see that there is only one obvious quantity with these dimensions, and that is 1/αkk, the reciprocal of the diagonal element. So that must set the scale of the constant. But that scale might itself be too big. So let’s divide the constant by some (nondimensional) fudge factor λ, with the possibility of setting λ 1 to cut down the step. In other words, replace equation (15.5.10) by δal = 1 λαll βl or λ αll δal = βl (15.5.12) It is necessary that αll be positive, but this is guaranteed by definition (15.5.11) — another reason for adopting that equation. Marquardt’s second insight is that equations (15.5.12) and (15.5.9) can be combined if we define a new matrix α by the following prescription α jj ≡ αjj (1 + λ) α jk ≡ αjk (j = k) (15.5.13) and then replace both (15.5.12) and (15.5.9) by M l=1 α kl δal = βk (15.5.14) When λ is very large, the matrix α is forced into being diagonally dominant, so equation (15.5.14) goes over to be identical to (15.5.12). On the other hand, as λ approaches zero, equation (15.5.14) goes over to (15.5.9). Given an initial guess for the set of fitted parameters a, the recommended Marquardt recipe is as follows: • Compute χ2(a). • Pick a modest value for λ, say λ = 0.001. • (†) Solve the linear equations (15.5.14) for δa and evaluate χ2(a + δa). • If χ2(a + δa) ≥χ2(a), increase λ by a factor of 10 (or any other substantial factor) and go back to (†). • If χ2(a + δa) < χ2(a), decrease λ by a factor of 10, update the trial solution a ← a + δa, and go back to (†). Also necessary is a condition for stopping. Iterating to convergence (to machine accuracy or to the roundoff limit) is generally wasteful and unnecessary since the minimum is at best only a statistical estimate of the parameters a. As we will see in §15.6, a change in the parameters that changes χ2 by an amount 1 is never statistically meaningful. Furthermore, it is not uncommon to find the parameters wandering around near the minimum in a flat valley of complicated topography. The reason is that Marquardt’s method generalizes the method of normal equations (§15.4), hence has the same problem as that method with regard to near-degeneracy of the minimum. Outright failure by a zero pivot is possible, but unlikely. More often, a small pivot will generate a large correction which is then rejected, the value of λ being then increased. For sufficiently large λ the matrix [α ] is positive definite and can have no small pivots. Thus the method does tend to stay away from zero
15.5 Nonlinear Models 685 pivots,but at the cost of a tendency to wander around doing steepest descent in very un-steep degenerate valleys. These considerations suggest that,in practice,one might as well stop iterating on the first or second occasion that x2 decreases by a negligible amount,say either less than 0.01 absolutely or(in case roundoff prevents that being reached)some fractional amount like 10-3.Don't stop after a step where x2 increases:That only shows that A has not yet adjusted itself optimally. Once the acceptable minimum has been found,one wants to set A=0 and compute the matrix 81 [C]=[a]-1 (15.5.15) which,as before.is the estimated covariance matrix of the standard errors in the fitted parameters a (see next section). The following pair of functions encodes Marquardt's method for nonlinear parameter estimation.Much of the organization matches that used in lfit of 815.4. In particular the array ia[1..ma]must be input with components one or zero RECIPES corresponding to whether the respective parameter values a [1..ma]are to be fitted for or held fixed at their input values,respectively. 器多ad 9 The routine mrqmin performs one iteration of Marquardt's method.It is first called (once)with alamda<0.which signals the routine to initialize.alamda is set on the first and all subsequent calls to the suggested value of A for the next iteration; a and chisq are always given back as the best parameters found so far and their x2.When convergence is deemed satisfactory,set alamda to zero before a final call. SR34 The matrices alpha and covar(which were used as workspace in all previous calls) will then be set to the curvature and covariance matrices for the converged parameter values.The arguments alpha,a,and chisq must not be modified between calls, 6 nor should alamda be,except to set it to zero for the final call.When an uphill step is taken,chisq and a are given back with their input(best)values,but alamda is set to an increased value. The routine mrqmin calls the routine mrqcof for the computation of the matrix [a](equation 15.5.11)and vector B (equations 15.5.6 and 15.5.8).In turn mrqcof calls the user-supplied routine funcs(x,a,y,dyda),which for input values x=i 六 Numerica 10621 and a a calculates the model function y =y(;a)and the vector of derivatives dyda≡ay/oak. 431 Recipes #include "nrutil.h" void mrqmin(float x,float y],float sig],int ndata,float a],int ia], int ma,float **covar,float *alpha,float *chisg, North Software. void (*funcs)(float,float ]float *float []int),float *alamda) Levenberg-Marquardt method,attempting to reduce the value x2 of a fit between a set of data points x[1..ndata],y[1..ndata]with individual standard deviations sig[1..ndata] and a nonlinear function dependent on ma coefficients a[1..ma].The input array ia[1..ma] indicates by nonzero entries those components of a that should be fitted for,and by zero entries those components that should be held fixed at their input values.The program re turns current best-fit values for the parameters a[1..ma],and x2 chisq.The arrays covar[1..ma][1..ma],alpha[1..ma][1..ma]are used as working space during most iterations.Supply a routine funcs(x,a,yfit,dyda,ma)that evaluates the fitting function yfit,and its derivatives dydal1..ma]with respect to the fitting parameters a at x.On the first call provide an initial guess for the parameters a,and set alamda<o for initialization (which then sets alamda=.001).If a step succeeds chisq becomes smaller and alamda de- creases by a factor of 10.If a step fails alamda grows by a factor of 10.You must call this
15.5 Nonlinear Models 685 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). pivots, but at the cost of a tendency to wander around doing steepest descent in very un-steep degenerate valleys. These considerations suggest that, in practice, one might as well stop iterating on the first or second occasion that χ2 decreases by a negligible amount, say either less than 0.01 absolutely or (in case roundoff prevents that being reached) some fractional amount like 10−3. Don’t stop after a step where χ2 increases: That only shows that λ has not yet adjusted itself optimally. Once the acceptable minimum has been found, one wants to set λ = 0 and compute the matrix [C] ≡ [α] −1 (15.5.15) which, as before, is the estimated covariance matrix of the standard errors in the fitted parameters a (see next section). The following pair of functions encodes Marquardt’s method for nonlinear parameter estimation. Much of the organization matches that used in lfit of §15.4. In particular the array ia[1..ma] must be input with components one or zero corresponding to whether the respective parameter values a[1..ma] are to be fitted for or held fixed at their input values, respectively. The routine mrqmin performs one iteration of Marquardt’s method. It is first called (once) with alamda < 0, which signals the routine to initialize. alamda is set on the first and all subsequent calls to the suggested value of λ for the next iteration; a and chisq are always given back as the best parameters found so far and their χ2. When convergence is deemed satisfactory, set alamda to zero before a final call. The matrices alpha and covar (which were used as workspace in all previous calls) will then be set to the curvature and covariance matrices for the converged parameter values. The arguments alpha, a, and chisq must not be modified between calls, nor should alamda be, except to set it to zero for the final call. When an uphill step is taken, chisq and a are given back with their input (best) values, but alamda is set to an increased value. The routine mrqmin calls the routine mrqcof for the computation of the matrix [α] (equation 15.5.11) and vector β (equations 15.5.6 and 15.5.8). In turn mrqcof calls the user-supplied routine funcs(x,a,y,dyda), which for input values x ≡ xi and a ≡ a calculates the model function y ≡ y(xi; a) and the vector of derivatives dyda ≡ ∂y/∂ak. #include "nrutil.h" void mrqmin(float x[], float y[], float sig[], int ndata, float a[], int ia[], int ma, float **covar, float **alpha, float *chisq, void (*funcs)(float, float [], float *, float [], int), float *alamda) Levenberg-Marquardt method, attempting to reduce the value χ2 of a fit between a set of data points x[1..ndata], y[1..ndata] with individual standard deviations sig[1..ndata], and a nonlinear function dependent on ma coefficients a[1..ma]. The input array ia[1..ma] indicates by nonzero entries those components of a that should be fitted for, and by zero entries those components that should be held fixed at their input values. The program returns current best-fit values for the parameters a[1..ma], and χ2 = chisq. The arrays covar[1..ma][1..ma], alpha[1..ma][1..ma] are used as working space during most iterations. Supply a routine funcs(x,a,yfit,dyda,ma) that evaluates the fitting function yfit, and its derivatives dyda[1..ma] with respect to the fitting parameters a at x. On the first call provide an initial guess for the parameters a, and set alamda<0 for initialization (which then sets alamda=.001). If a step succeeds chisq becomes smaller and alamda decreases by a factor of 10. If a step fails alamda grows by a factor of 10. You must call this
686 Chapter 15. Modeling of Data routine repeatedly until convergence is achieved.Then,make one final call with alamda=0,so that covar[1..ma][1..ma]returns the covariance matrix,and alpha the curvature matrix. (Parameters held fixed will return zero covariances. void covsrt(float **covar,int ma,int ia[],int mfit); void gaussj(float **a,int n,float **b,int m); void mrqcof(float x[],float y[],float sig[],int ndata,float a[], int ia],int ma,float **alpha,float beta[],float *chisq, void (*funcs)(float,float []float *float ]int)); int i,k,l; static int mfit; static float ochisq,*atry,*beta,*da,*oneda; http://www.nr read able files Permission is if (*alamda 0.0){ nitialization. atry-vector(1,ma); 839 beta=vector(1,ma); (including this one) granted for 19881992 da=vector(1,ma); for (mfit=0,j=1;j<=ma;j++) 11-600 if (ia[j])mfit++; oneda=matrix(1,mfit,1,1); *alamda=0.001: mrqcof(x,y,sig,ndata,a,ia,ma,alpha,beta,chisq,funcs); ochisq=(*chisq); for (j=1;j<=ma;j++)atry[j]=a[j]; for (j=1;j<=mfit;j++){ Alter linearized fitting matrix,by augmenting di- (North America to any server computer. Cambridge University Press. from NUMERICAL RECIPES IN THE for (k=1;k<=mfit;k++)covar[j][k]=alpha[j][k]; agonal elements. covar[j][j]=alpha[j][j]*(1.0+(*alamda)); one paper ART oneda[j][1]=beta[j]; gaussj(covar,mfit,oneda,1); Matrix solution. Programs for (j=1;j<=mfit;j++)da[j]=oneda[j][1]; if (*alamda ==0.0){ Once converged,evaluate covariance matrix. covsrt(covar,ma,ia,mfit); covsrt(alpha,ma,ia,mfit); Spread out alpha to its full size too. free_matrix(oneda,1,mfit,1,1); to dir free_vector(da,1,ma); free_vector(beta,1,ma); free_vector(atry,1,ma); rectcustsen 1988-1992 by Numerical Recipes OF SCIENTIFIC COMPUTING(ISBN return; for(j=0,1=1;1<=ma;1+) Did the trial succeed? if (ia[l])atry[1]=a[l]+da[++j]; mrqcof(x,y,sig,ndata,atry,ia,ma,covar,da,chisq,funcs); if (*chisg ochisg){ Success,accept the new solution. *alamda *0.1; v@cambridge.org Numerical Recipes books or 10-621 43108 ochisq=(*chisq) for (j=1;i<=mfit;++){ for (k=1;k<=mfit;k++)alpha[j][k]=covar[j][k]; beta[j]=da[j]; Software. for(1=1;1<=ma;1++)a[1]=atry[1]; (outside North Amer e1se{ Failure,increase alamda and return. *alamda米=10.0 *chisq=ochisq; visit website machine Notice the use of the routine covsrt from 815.4.This is merely for rearranging the covariance matrix covar into the order of all ma parameters.The above routine also makes use of
686 Chapter 15. Modeling of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). routine repeatedly until convergence is achieved. Then, make one final call with alamda=0, so that covar[1..ma][1..ma] returns the covariance matrix, and alpha the curvature matrix. (Parameters held fixed will return zero covariances.) { void covsrt(float **covar, int ma, int ia[], int mfit); void gaussj(float **a, int n, float **b, int m); void mrqcof(float x[], float y[], float sig[], int ndata, float a[], int ia[], int ma, float **alpha, float beta[], float *chisq, void (*funcs)(float, float [], float *, float [], int)); int j,k,l; static int mfit; static float ochisq,*atry,*beta,*da,**oneda; if (*alamda < 0.0) { Initialization. atry=vector(1,ma); beta=vector(1,ma); da=vector(1,ma); for (mfit=0,j=1;j<=ma;j++) if (ia[j]) mfit++; oneda=matrix(1,mfit,1,1); *alamda=0.001; mrqcof(x,y,sig,ndata,a,ia,ma,alpha,beta,chisq,funcs); ochisq=(*chisq); for (j=1;j<=ma;j++) atry[j]=a[j]; } for (j=1;j<=mfit;j++) { Alter linearized fitting matrix, by augmenting difor (k=1;k<=mfit;k++) covar[j][k]=alpha[j][k]; agonal elements. covar[j][j]=alpha[j][j]*(1.0+(*alamda)); oneda[j][1]=beta[j]; } gaussj(covar,mfit,oneda,1); Matrixsolution. for (j=1;j<=mfit;j++) da[j]=oneda[j][1]; if (*alamda == 0.0) { Once converged, evaluate covariance matrix. covsrt(covar,ma,ia,mfit); covsrt(alpha,ma,ia,mfit); Spread out alpha to its full size too. free_matrix(oneda,1,mfit,1,1); free_vector(da,1,ma); free_vector(beta,1,ma); free_vector(atry,1,ma); return; } for (j=0,l=1;l<=ma;l++) Did the trial succeed? if (ia[l]) atry[l]=a[l]+da[++j]; mrqcof(x,y,sig,ndata,atry,ia,ma,covar,da,chisq,funcs); if (*chisq < ochisq) { Success, accept the new solution. *alamda *= 0.1; ochisq=(*chisq); for (j=1;j<=mfit;j++) { for (k=1;k<=mfit;k++) alpha[j][k]=covar[j][k]; beta[j]=da[j]; } for (l=1;l<=ma;l++) a[l]=atry[l]; } else { Failure, increase alamda and return. *alamda *= 10.0; *chisq=ochisq; } } Notice the use of the routine covsrt from §15.4. This is merely for rearranging the covariance matrix covar into the order of all ma parameters. The above routine also makes use of
15.5 Nonlinear Models 687 #include "nrutil.h" void mrqcof(float x[],float y[],float sig[],int ndata,float a],int ia[], int ma,float **alpha,float beta[],float *chisq, void (*funcs)(float,float []float *float []int)) Used by mrqmin to evaluate the linearized fitting matrix alpha,and vector beta as in(15.5.8), and calculate x2. int i,j,k,l,m,mfit=0; float ymod,wt,sig2i,dy,*dyda; dyda=vector(1,ma); for (i=1;j<=ma;++) if (ia[j])mfit++; for (j=1;j<=mfit;j++){ Initialize (symmetric)alpha,beta 83 for (k=1;k<=j;k++)alpha[j][k]=0.0; beta[j]=0.0; 11-800-872 (including this one) granted for 19881992 *ch1sq=0.0; for (i=1;i<=ndata;i++){ Summation loop over all data. (*funcs)(x[i],a,&ymod,dyda,ma); /Cambridge s1g21=1.0/(s1g[1]*s1g[1]); dy=y[i]-ymod; for(j=0,1=1;1<=ma;1+) if (ia[]){ wt=dyda[1]*sig2i; 7423 (North America for(j++,k=0,m=1;m<=1;m++) to to users to make one paper from NUMERICAL RECIPES IN C: e University Press. THE if (ia[m])alpha[j][++k]+wt*dyda[m]; ART beta[j]+dy*wt; only). Programs *chisq +dy*dy*sig2i; And find x2 send copyfor thei for (i=2;i<=mfit;++) Fill in the symmetric side. st st for (k=1;k<j;k++)alpha[k][j]=alpha[j][k]; email Copyright (C) free_vector(dyda,1,ma); 1988-1992 by Numerical Recipes OF SCIENTIFIC COMPUTING(ISBN Example v@cambr 10-621 The following function fgauss is an example of a user-supplied function 43106 funcs.Used with the above routine mrqmin (in turn using mrqcof,covsrt,and gaussj),it fits for the model (outside )=∑B (是)月 North Software. (15.5.16) 1 which is a sum of K Gaussians,each having a variable position,amplitude,and width.We store the parameters in the order B1,E1,G1,B2,E2,G2,...,BK, EK:GK
15.5 Nonlinear Models 687 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). #include "nrutil.h" void mrqcof(float x[], float y[], float sig[], int ndata, float a[], int ia[], int ma, float **alpha, float beta[], float *chisq, void (*funcs)(float, float [], float *, float [], int)) Used by mrqmin to evaluate the linearized fitting matrix alpha, and vector beta as in (15.5.8), and calculate χ2. { int i,j,k,l,m,mfit=0; float ymod,wt,sig2i,dy,*dyda; dyda=vector(1,ma); for (j=1;j<=ma;j++) if (ia[j]) mfit++; for (j=1;j<=mfit;j++) { Initialize (symmetric) alpha, beta. for (k=1;k<=j;k++) alpha[j][k]=0.0; beta[j]=0.0; } *chisq=0.0; for (i=1;i<=ndata;i++) { Summation loop over all data. (*funcs)(x[i],a,&ymod,dyda,ma); sig2i=1.0/(sig[i]*sig[i]); dy=y[i]-ymod; for (j=0,l=1;l<=ma;l++) { if (ia[l]) { wt=dyda[l]*sig2i; for (j++,k=0,m=1;m<=l;m++) if (ia[m]) alpha[j][++k] += wt*dyda[m]; beta[j] += dy*wt; } } *chisq += dy*dy*sig2i; And find χ2. } for (j=2;j<=mfit;j++) Fill in the symmetric side. for (k=1;k<j;k++) alpha[k][j]=alpha[j][k]; free_vector(dyda,1,ma); } Example The following function fgauss is an example of a user-supplied function funcs. Used with the above routine mrqmin (in turn using mrqcof, covsrt, and gaussj), it fits for the model y(x) = K k=1 Bk exp − x − Ek Gk 2 (15.5.16) which is a sum of K Gaussians, each having a variable position, amplitude, and width. We store the parameters in the order B1, E1, G1, B2, E2, G2,...,BK, EK, GK.
688 Chapter 15.Modeling of Data #include void fgauss(float x,float a[],float *y,float dyda],int na) y(z;a)is the sum of na/3 Gaussians (15.5.16).The amplitude,center,and width of the Gaussians are stored in consecutive locations of a:a[i]Bk,a[i+1]Ek,a[i+2]=Gk, =1,...na/3.The dimensions of the arrays are a[1..na],dyda[1..na]. int i; float fac,ex,argi *y=0.0; for(i=1:1<na-1:i+=3)[ arg=(x-a[i+1])/a[i+2]; ex=exp(-arg*arg); fac=a[i]*ex*2.0*arg; 83 *y+=a[1]*ex; 19881992 dyda[i]=ex; dyda[i+1]=fac/a[i+2]; (including this one) dyda[i+2]=factarg/a[i+2]; 111800.872 NUMERICAL RECIPES (Nort server More Advanced Methods for Nonlinear Least Squares America computer, The Levenberg-Marquardt algorithm can be implemented as a model-trust region method for minimization(see 89.7 and ref.[2))applied to the special case of a least squares function.A code of this kind due to More [3]can be found in MINPACK [4].Another algorithm for nonlinear least-squares keeps the second- derivative term we dropped in the Levenberg-Marquardt method whenever it would OF SCIENTIFIC be better to do so.These methods are called "full Newton-type"methods and are reputed to be more robust than Levenberg-Marquardt,but more complex.One 6 implementation is the code NL2SOL [5]. CITED REFERENCES AND FURTHER READING: &E Bevington,P.R.1969,Data Reduction and Error Analysis for the Physical Sciences (New York: McGraw-Hill),Chapter 11. Marquardt,D.W.1963,Journal of the Society for Industrial and Applied Mathematics,vol.11, Numerica 10621 Pp.431-441.[1] 43106 Jacobs,D.A.H.(ed.)1977,The State of the Art in Numerical Analysis(London:Academic Press). Recipes Chapter Ill.2 (by J.E.Dennis). Dennis,J.E.,and Schnabel,R.B.1983,Numerica/Methods for Unconstrained Optimization and Nonlinear Equations (Englewood Cliffs,NJ:Prentice-Hall).[2] More,J.J.1977,in Numerical Analysis,Lecture Notes in Mathematics,vol.630,G.A.Watson, North ed.(Berlin:Springer-Verlag).pp.105-116.[3] More,J.J.,Garbow,B.S.,and Hillstrom,K.E.1980,User Guide for MINPACK-1,Argonne National Laboratory Report ANL-80-74.[4] Dennis,J.E.,Gay,D.M,and Welsch,R.E.1981,ACM Transactions on Mathematical Software, vol.7,pp.348-368:op.ct.,pp.369-383.l
688 Chapter 15. Modeling of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). #include void fgauss(float x, float a[], float *y, float dyda[], int na) y(x; a) is the sum of na/3 Gaussians (15.5.16). The amplitude, center, and width of the Gaussians are stored in consecutive locations of a: a[i] = Bk, a[i+1] = Ek, a[i+2] = Gk, k = 1, ..., na/3. The dimensions of the arrays are a[1..na], dyda[1..na]. { int i; float fac,ex,arg; *y=0.0; for (i=1;i<=na-1;i+=3) { arg=(x-a[i+1])/a[i+2]; ex=exp(-arg*arg); fac=a[i]*ex*2.0*arg; *y += a[i]*ex; dyda[i]=ex; dyda[i+1]=fac/a[i+2]; dyda[i+2]=fac*arg/a[i+2]; } } More Advanced Methods for Nonlinear Least Squares The Levenberg-Marquardt algorithm can be implemented as a model-trust region method for minimization (see §9.7 and ref. [2]) applied to the special case of a least squares function. A code of this kind due to Mor e´ [3] can be found in MINPACK [4]. Another algorithm for nonlinear least-squares keeps the secondderivative term we dropped in the Levenberg-Marquardt method whenever it would be better to do so. These methods are called “full Newton-type” methods and are reputed to be more robust than Levenberg-Marquardt, but more complex. One implementation is the code NL2SOL [5]. CITED REFERENCES AND FURTHER READING: Bevington, P.R. 1969, Data Reduction and Error Analysis for the Physical Sciences (New York: McGraw-Hill), Chapter 11. Marquardt, D.W. 1963, Journal of the Society for Industrial and Applied Mathematics, vol. 11, pp. 431–441. [1] Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press), Chapter III.2 (by J.E. Dennis). Dennis, J.E., and Schnabel, R.B. 1983, Numerical Methods for Unconstrained Optimization and Nonlinear Equations (Englewood Cliffs, NJ: Prentice-Hall). [2] Mor´e, J.J. 1977, in Numerical Analysis, Lecture Notes in Mathematics, vol. 630, G.A. Watson, ed. (Berlin: Springer-Verlag), pp. 105–116. [3] Mor´e, J.J., Garbow, B.S., and Hillstrom, K.E. 1980, User Guide for MINPACK-1, Argonne National Laboratory Report ANL-80-74. [4] Dennis, J.E., Gay, D.M, and Welsch, R.E. 1981, ACM Transactions on Mathematical Software, vol. 7, pp. 348–368; op. cit., pp. 369–383. [5]
15.6 Confidence Limits on Estimated Model Parameters 689 15.6 Confidence Limits on Estimated Model Parameters Several times already in this chapter we have made statements about the standard errors,or uncertainties,in a set of M estimated parameters a.We have given some formulas for computing standard deviations or variances of individual parameters (equations 15.2.9,15.4.15,15.4.19),as well as some formulas for covariances between pairs of parameters(equation 15.2.10;remark following equation 15.4.15; equation 15.4.20;equation 15.5.15). In this section,we want to be more explicit regarding the precise meaning of these quantitative uncertainties,and to give further information about how 县 quantitative confidence limits on fitted parameters can be estimated.The subject can get somewhat technical,and even somewhat confusing,so we will try to make precise statements,even when they must be offered without proof. Figure 15.6.1 shows the conceptual scheme of an experiment that"measures" a set of parameters.There is some underlying true set of parameters atre that are ⊙ known to Mother Nature but hidden from the experimenter.These true parameters 9 are statistically realized,along with random measurement errors,as a measured data set,which we will symbolize as D(o).The data set D(o)is known to the experimenter. He or she fits the data to a model by x2minimization or some other technique,and obtains measured,i.e.,fitted,values for the parameters,which we here denote a(o). Because measurement errors have a random component,D(o)is not a unique 三,09 9 realization of the true parameters atrue.Rather,there are infinitely many other realizations of the true parameters as "hypothetical data sets"each of which could OF SCIENTIFIC have been the one measured,but happened not to be.Let us symbolize these 6 by D(1),D(2),....Each one,had it been realized,would have given a slightly different set of fitted parameters,a),a(2)...,respectively.These parameter sets a()therefore occur with some probability distribution in the M-dimensional space of all possible parameter sets a.The actual measured set a (o)is one member drawn from this distribution Even more interesting than the probability distribution of a()would be the distribution of the difference a()-atrue.This distribution differs from the former one by a translation that puts Mother Nature's true value at the origin.If we knew this distribution,we would know everything that there is to know about the quantitative 高 10621 uncertainties in our experimental measurement a(o). So the name of the game is to find some way of estimating or approximating 腿 the probability distribution of a()-atrue without knowing atrue and without having North available to us an infinite universe of hypothetical data sets. Monte Carlo Simulation of Synthetic Data Sets Although the measured parameter set a(o)is not the true one,let us consider a fictitious world in which it was the true one.Since we hope that our measured parameters are not too wrong,we hope that that fictitious world is not too different from the actual world with parameters atrue.In particular,let us hope-no,let us assume-that the shape of the probability distribution a()-a(o)in the fictitious world is the same,or very nearly the same,as the shape of the probability distribution
15.6 Confidence Limits on Estimated Model Parameters 689 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 15.6 Confidence Limits on Estimated Model Parameters Several times already in this chapter we have made statements about the standard errors, or uncertainties, in a set of M estimated parameters a. We have given some formulas for computing standard deviations or variances of individual parameters (equations 15.2.9, 15.4.15, 15.4.19), as well as some formulas for covariances between pairs of parameters (equation 15.2.10; remark following equation 15.4.15; equation 15.4.20; equation 15.5.15). In this section, we want to be more explicit regarding the precise meaning of these quantitative uncertainties, and to give further information about how quantitative confidence limits on fitted parameters can be estimated. The subject can get somewhat technical, and even somewhat confusing, so we will try to make precise statements, even when they must be offered without proof. Figure 15.6.1 shows the conceptual scheme of an experiment that “measures” a set of parameters. There is some underlying true set of parameters a true that are known to Mother Nature but hidden from the experimenter. These true parameters are statistically realized, along with random measurement errors, as a measured data set, which we will symbolize as D(0). The data set D(0) is known to the experimenter. He or she fits the data to a model by χ2 minimization or some other technique, and obtains measured, i.e., fitted, values for the parameters, which we here denote a (0). Because measurement errors have a random component, D(0) is not a unique realization of the true parameters atrue. Rather, there are infinitely many other realizations of the true parameters as “hypothetical data sets” each of which could have been the one measured, but happened not to be. Let us symbolize these by D(1), D(2),... . Each one, had it been realized, would have given a slightly different set of fitted parameters, a(1), a(2),... , respectively. These parameter sets a(i) therefore occur with some probability distribution in the M-dimensional space of all possible parameter sets a. The actual measured set a(0) is one member drawn from this distribution. Even more interesting than the probability distribution of a (i) would be the distribution of the difference a(i) − atrue. This distribution differs from the former one by a translation that puts Mother Nature’s true value at the origin. If we knew this distribution, we would know everything that there is to know about the quantitative uncertainties in our experimental measurement a(0). So the name of the game is to find some way of estimating or approximating the probability distribution of a(i) − atrue without knowing atrue and without having available to us an infinite universe of hypothetical data sets. Monte Carlo Simulation of Synthetic Data Sets Although the measured parameter set a(0) is not the true one, let us consider a fictitious world in which it was the true one. Since we hope that our measured parameters are not too wrong, we hope that that fictitious world is not too different from the actual world with parameters atrue. In particular, let us hope — no, let us assume — that the shape of the probability distribution a(i) − a(0) in the fictitious world is the same, or very nearly the same, as the shape of the probability distribution