10.7 Variable Metric Methods in Multidimensions 425 *fret=dbrent(ax,xx,bx,fidim,df1dim,TOL,&xmin); for(j=1;j<=n;j++){ Construct the vector results to return. xi[j]*xmin; p[j]+=x1[j]; 2 free_vector(xicom,1,n); free_vector(pcom,1,n); #include "nrutil.h" http://www.r. Permission is read able files extern int ncom; Defined in dlinmin. extern float *pcom,*xicom,(*nrfunc)(float [) extern void (*nrdfun)(float []float []) .com or call float dfidim(float x) 11-800-872 (including this one) granted for 19881992 int j; float df1=0.0; float *xt,*df; xt=vector(1,ncom): dfsvector(1,ncom); for (j=1;j<=ncom;j++)xt [j]=pcom[j]+x*xicom[j]; -7423(North America (*nrdfun)(xt,df); 电r:1s t tusers to make one paper by Cambridge University Press. from NUMERICAL RECIPES IN C: THE for (j=1;j<=ncom;j++)df1 +df[j]*xicom[j]; 是 ART free_vector(df,1,ncom); free_vector(xt,1,ncom); return df1; strictly proh Programs to dir Copyright (C) CITED REFERENCES AND FURTHER READING: Polak,E.1971,Computational Methods in Optimization(New York:Academic Press),82.3.[1] OF SCIENTIFIC COMPUTING(ISBN Jacobs,D.A.H.(ed.)1977,The State of the Art in Numerical Analysis (London:Academic Press) ectcustser Chapter Ill.1.7 (by K.W.Brodlie).[2] Stoer,J.,and Bulirsch,R.1980,Introduction to Numerical Analysis(New York:Springer-Verlag). v@cam 88.7. 1988-1992 by Numerical Recipes 10-:6211 43108 10.7 Variable Metric Methods in (outside Multidimensions Software. Amer The goal of variable metric methods,which are sometimes called quasi-Newton visit website methods,is not different from the goal of conjugate gradient methods:to accumulate machine information from successive line minimizations so that N such line minimizations lead to the exact minimum of a quadratic form in N dimensions.In that case,the method will also be quadratically convergent for more general smooth functions. Both variable metric and conjugate gradient methods require that you are able to compute your function's gradient,or first partial derivatives,at arbitrary points.The variable metric approach differs from the conjugate gradient in the way that it stores
10.7 Variable Metric Methods in Multidimensions 425 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). *fret=dbrent(ax,xx,bx,f1dim,df1dim,TOL,&xmin); for (j=1;j<=n;j++) { Construct the vector results to return. xi[j] *= xmin; p[j] += xi[j]; } free_vector(xicom,1,n); free_vector(pcom,1,n); } #include "nrutil.h" extern int ncom; Defined in dlinmin. extern float *pcom,*xicom,(*nrfunc)(float []); extern void (*nrdfun)(float [], float []); float df1dim(float x) { int j; float df1=0.0; float *xt,*df; xt=vector(1,ncom); df=vector(1,ncom); for (j=1;j<=ncom;j++) xt[j]=pcom[j]+x*xicom[j]; (*nrdfun)(xt,df); for (j=1;j<=ncom;j++) df1 += df[j]*xicom[j]; free_vector(df,1,ncom); free_vector(xt,1,ncom); return df1; } CITED REFERENCES AND FURTHER READING: Polak, E. 1971, Computational Methods in Optimization (New York: Academic Press), §2.3. [1] Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press), Chapter III.1.7 (by K.W. Brodlie). [2] Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), §8.7. 10.7 Variable Metric Methods in Multidimensions The goal of variable metric methods, which are sometimes called quasi-Newton methods, is not different from the goal of conjugate gradient methods: to accumulate information from successive line minimizations so that N such line minimizations lead to the exact minimum of a quadratic form in N dimensions. In that case, the method will also be quadratically convergent for more general smooth functions. Both variable metric and conjugate gradient methods require that you are able to compute your function’s gradient, or first partial derivatives, at arbitrary points. The variable metric approach differs from the conjugate gradient in the way that it stores
426 Chapter 10.Minimization or Maximization of Functions and updates the information that is accumulated.Instead of requiring intermediate storage on the order of N,the number of dimensions,it requires a matrix of size Nx N.Generally,for any moderate N,this is an entirely trivial disadvantage. On the other hand,there is not,as far as we know,any overwhelming advantage that the variable metric methods hold over the conjugate gradient techniques,except perhaps a historical one.Developed somewhat earlier,and more widely propagated, the variable metric methods have by now developed a wider constituency of satisfied users.Likewise,some fancier implementations of variable metric methods(going beyond the scope of this book,see below)have been developed to a greater level of sophistication on issues like the minimization of roundoff error,handling of special conditions,and so on.We tend to use variable metric rather than conjugate gradient. but we have no reason to urge this habit on you. Variable metric methods come in two main flavors.One is the Davidon-Fletcher- Powell (DFP)algorithm (sometimes referred to as simply Fletcher-Powell).The 、是主恩P 8 other goes by the name Broyden-Fletcher-Goldfarb-Shanno(BFGS).The BFGS and DFP schemes differ only in details of their roundoff error,convergence tolerances, and similar "dirty"issues which are outside of our scope [1.21.However,it has RECIPES become generally recognized that,empirically,the BFGS scheme is superior in these details.We will implement BFGS in this section. 9 As before,we imagine that our arbitrary function f(x)can be locally approx- imated by the quadratic form of equation (10.6.1).We don't,however,have any information about the values of the quadratic form's parameters A and b,except insofar as we can glean such information from our function evaluations and line minimizations. 孕是g合N The basic idea of the variable metric method is to build up,iteratively,a good approximation to the inverse Hessian matrix A,that is,to construct a sequence of matrices Hi with the property, lim H:=A-1 (10.7.1) Even better if the limit is achieved after N iterations instead of oo. The reason that variable metric methods are sometimes called quasi-Newton 10621 methods can now be explained.Consider finding a minimum by using Newton's Numerica method to search for a zero of the gradient of the function.Near the current point 43106 x;,we have to second order f(x)=f(x)+(x-x).7f(x)+(x-x)·A·(x-x) (10.7.2) So Vf(x)=7f(x)+A·(x-x) (10.7.3) In Newton's method we set Vf(x)=0 to determine the next iteration point: x-x=-A-1.7f(x) (10.7.4) The left-hand side is the finite step we need take to get to the exact minimum;the right-hand side is known once we have accumulated an accurate HA-1. The"quasi"in quasi-Newton is because we don't use the actual Hessian matrix of f,but instead use our current approximation of it.This is often better than
426 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). and updates the information that is accumulated. Instead of requiring intermediate storage on the order of N, the number of dimensions, it requires a matrix of size N × N. Generally, for any moderate N, this is an entirely trivial disadvantage. On the other hand, there is not, as far as we know, any overwhelming advantage that the variable metric methods hold over the conjugate gradient techniques, except perhaps a historical one. Developed somewhat earlier, and more widely propagated, the variable metric methods have by now developed a wider constituency of satisfied users. Likewise, some fancier implementations of variable metric methods (going beyond the scope of this book, see below) have been developed to a greater level of sophistication on issues like the minimization of roundoff error, handling of special conditions, and so on. We tend to use variable metric rather than conjugate gradient, but we have no reason to urge this habit on you. Variable metric methods come in two main flavors. One is the Davidon-FletcherPowell (DFP) algorithm (sometimes referred to as simply Fletcher-Powell). The other goes by the name Broyden-Fletcher-Goldfarb-Shanno (BFGS). The BFGS and DFP schemes differ only in details of their roundoff error, convergence tolerances, and similar “dirty” issues which are outside of our scope [1,2]. However, it has become generally recognized that, empirically, the BFGS scheme is superior in these details. We will implement BFGS in this section. As before, we imagine that our arbitrary function f(x) can be locally approximated by the quadratic form of equation (10.6.1). We don’t, however, have any information about the values of the quadratic form’s parameters A and b, except insofar as we can glean such information from our function evaluations and line minimizations. The basic idea of the variable metric method is to build up, iteratively, a good approximation to the inverse Hessian matrix A−1, that is, to construct a sequence of matrices Hi with the property, lim i→∞ Hi = A−1 (10.7.1) Even better if the limit is achieved after N iterations instead of ∞. The reason that variable metric methods are sometimes called quasi-Newton methods can now be explained. Consider finding a minimum by using Newton’s method to search for a zero of the gradient of the function. Near the current point xi, we have to second order f(x) = f(xi)+(x − xi) · ∇f(xi) + 1 2 (x − xi) · A · (x − xi) (10.7.2) so ∇f(x) = ∇f(xi) + A · (x − xi) (10.7.3) In Newton’s method we set ∇f(x)=0 to determine the next iteration point: x − xi = −A−1 · ∇f(xi) (10.7.4) The left-hand side is the finite step we need take to get to the exact minimum; the right-hand side is known once we have accumulated an accurate H ≈ A −1. The “quasi” in quasi-Newton is because we don’t use the actual Hessian matrix of f, but instead use our current approximation of it. This is often better than
10.7 Variable Metric Methods in Multidimensions 427 using the true Hessian.We can understand this paradoxical result by considering the descent directions of f at xi.These are the directions p along which f decreases: Vf.p<0.For the Newton direction(10.7.4)to be a descent direction,we must have 7f(x:)·(x-x)=-(x-x)·A·(x-x:)<0 (10.7.5) which is true if A is positive definite.In general,far from a minimum,we have no guarantee that the Hessian is positive definite.Taking the actual Newton step with the real Hessian can move us to points where the function is increasing in value. The idea behind quasi-Newton methods is to start with a positive definite,symmetric 81 approximation to A (usually the unit matrix)and build up the approximating Hi's in such a way that the matrix H;remains positive definite and symmetric.Far from the minimum,this guarantees that we always move in a downhill direction.Close to the minimum,the updating formula approaches the true Hessian and we enjoy the quadratic convergence of Newton's method. When we are not close enough to the minimum,taking the full Newton step p even with a positive definite A need not decrease the function;we may move RECIPES too far for the quadratic approximation to be valid.All we are guaranteed is that initially f decreases as we move in the Newton direction.Once again we can use 9 the backtracking strategy described in 89.7 to choose a step along the direction of the Newton step p,but not necessarily all the way. We won't rigorously derive the DFP algorithm for taking Hi into Hi+1;you can consult [3]for clear derivations.Following Brodlie (in [2)),we will give the following heuristic motivation of the procedure. Subtracting equation(10.7.4)at x from that same equation at x:gives x+1-x=A-1.(7f+1-7f) (10.7.6) where Vf;=Vf(xj).Having made the step from xi to x:+1,we might reasonably want to require that the new approximation Hi+satisfy (10.7.6)as if it were actually A,that is, Numerica 10621 xi+1-xi=H+1·(7f+1-7f) (10.7.7) 43126 We might also imagine that the updating formula should be of the form H= H;+correction. What"objects"are around out of which to construct a correction term?Most notable are the two vectors x:+1-xi and Vfi+1-Vfi;and there is also Hi. There are not infinitely many natural ways of making a matrix out of these objects, especially if(10.7.7)must hold!One such way,the DFP updating formula,is H+1=H+ (K+1-x)图(x+1-x) (K+1-x)·(Vf+1-7f) -4·(f+1-f]®画,·(Vf+1-Vf】 (10.7.8) (7fi+1-7fi):H·(Tf+1-7fi) where denotes the "outer"or "direct"product of two vectors,a matrix:The ij component ofuv is uivj.(You might want to verify that 10.7.8 does satisfy 10.7.7.)
10.7 Variable Metric Methods in Multidimensions 427 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). using the true Hessian. We can understand this paradoxical result by considering the descent directions of f at xi. These are the directions p along which f decreases: ∇f ·p < 0. For the Newton direction (10.7.4) to be a descent direction, we must have ∇f(xi) · (x − xi) = −(x − xi) · A · (x − xi) < 0 (10.7.5) which is true if A is positive definite. In general, far from a minimum, we have no guarantee that the Hessian is positive definite. Taking the actual Newton step with the real Hessian can move us to points where the function is increasing in value. The idea behind quasi-Newton methods is to start with a positive definite, symmetric approximation to A (usually the unit matrix) and build up the approximating H i’s in such a way that the matrix Hi remains positive definite and symmetric. Far from the minimum, this guarantees that we always move in a downhill direction. Close to the minimum, the updating formula approaches the true Hessian and we enjoy the quadratic convergence of Newton’s method. When we are not close enough to the minimum, taking the full Newton step p even with a positive definite A need not decrease the function; we may move too far for the quadratic approximation to be valid. All we are guaranteed is that initially f decreases as we move in the Newton direction. Once again we can use the backtracking strategy described in §9.7 to choose a step along the direction of the Newton step p, but not necessarily all the way. We won’t rigorously derive the DFP algorithm for taking Hi into Hi+1; you can consult [3] for clear derivations. Following Brodlie (in [2]), we will give the following heuristic motivation of the procedure. Subtracting equation (10.7.4) at xi+1 from that same equation at xi gives xi+1 − xi = A−1 · (∇fi+1 − ∇fi) (10.7.6) where ∇fj ≡ ∇f(xj ). Having made the step from xi to xi+1, we might reasonably want to require that the new approximation Hi+1 satisfy (10.7.6) as if it were actually A−1, that is, xi+1 − xi = Hi+1 · (∇fi+1 − ∇fi) (10.7.7) We might also imagine that the updating formula should be of the form H i+1 = Hi + correction. What “objects” are around out of which to construct a correction term? Most notable are the two vectors xi+1 − xi and ∇fi+1 − ∇fi; and there is also Hi. There are not infinitely many natural ways of making a matrix out of these objects, especially if (10.7.7) must hold! One such way, the DFP updating formula, is Hi+1 = Hi + (xi+1 − xi) ⊗ (xi+1 − xi) (xi+1 − xi) · (∇fi+1 − ∇fi) − [Hi · (∇fi+1 − ∇fi)] ⊗ [Hi · (∇fi+1 − ∇fi)] (∇fi+1 − ∇fi) · Hi · (∇fi+1 − ∇fi) (10.7.8) where ⊗ denotes the “outer” or “direct” product of two vectors, a matrix: The ij component of u⊗v is uivj . (You might want to verify that 10.7.8 does satisfy 10.7.7.)
428 Chapter 10.Minimization or Maximization of Functions The BFGS updating formula is exactly the same,but with one additional term, …+[(7f+1-7f)H…(7f+1-7f)】u⑧u (10.7.9) where u is defined as the vector (xi+1-x2) u三 (ki+1-x)·(7fi+1-7f) H:·(Vf+1-Vf) (10.7.10) (f+1-7f)H:(7f+1-Vf) (You might also verify that this satisfies 10.7.7.) You will have to take on faith一or else consult3]for details of一the“deep” result that equation(10.7.8),with or without(1079),does in fact converge toA in N steps,if f is a quadratic form. Here now is the routine dfpmin that implements the quasi-Newton method,and uses Insrch from $9.7.As mentioned at the end of newt in $9.7,this algorithm ERIC can fail if your variables are badly scaled. #include (North 令 #include "nrutil.h" #define ITMAX 200 Maximum allowed number of iterations. #define EPS 3.0e-8 Machine precision. 。、 Press. #define TOLX (4*EPS) Convergence criterion on values. #define STPMX 100.0 Scaled maximum step length allowed in line searches. Programs #define FREEALL free_vector(xi,1,n);free_vector(pnew,1,n);\ free_matrix(hessin,1,n,1,n);free_vector(hdg,1,n);free_vector(g,1,n); free_vector(dg,1,n); SCIENTIFIC void dfpmin(float p[],int n,float gtol,int *iter,float *fret, float(*func)(float [])void (*dfunc)(float ]float []) Given a starting point p[1..n]that is a vector of length n,the Broyden-Fletcher-Goldfarb- Shanno variant of Davidon-Fletcher-Powell minimization is performed on a function func,using its gradient as calculated by a routine dfunc.The convergence requirement on zeroing the gradient is input as gtol.Returned quantities are p[1..n](the location of the minimum). 1920 COMPUTING (ISBN iter (the number of iterations that were performed),and fret (the minimum value of the function).The routine Insrch is called to perform approximate line minimizations. f void1 nsrch(intn,f1 oat xold[▣,f1 oat fold,f1oatg0,f1oatp[▣,f1oatx[], float *f,float stpmax,int *check,float (*func)(float [])) uction Numerical Recipes 10621 43106 int check,i,its,j; float den,fac,fad,fae,fp,stpmax,sum=0.0,sumdg,sumxi,temp,test; float *dg,*g,*hdg,**hessin,*pnew,*xi; (outside dg=vector(1,n); Software. g=vector(1,n); hdg=vector(1,n); hessin=matrix(1,n,1,n); pnew=vector(1,n); xisvector(1,n); fp=(*func)(p); Calculate starting function value and gra- (*dfunc)(p,g); dient, for(i=1:1<=n;1++){ and initialize the inverse Hessian to the for (j=1;j<=n;j++)hessin[i][j]=0.0; unit matrix. hessin[i][i]=1.0; x1[i]=-g[1]; Initial line direction sum +p[i]*p[i]; stpmax=STPMX*FMAX(sgrt(sum),(float)n);
428 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). The BFGS updating formula is exactly the same, but with one additional term, ··· + [(∇fi+1 − ∇fi) · Hi · (∇fi+1 − ∇fi)] u ⊗ u (10.7.9) where u is defined as the vector u ≡ (xi+1 − xi) (xi+1 − xi) · (∇fi+1 − ∇fi) − Hi · (∇fi+1 − ∇fi) (∇fi+1 − ∇fi) · Hi · (∇fi+1 − ∇fi) (10.7.10) (You might also verify that this satisfies 10.7.7.) You will have to take on faith — or else consult [3] for details of — the “deep” result that equation (10.7.8), with or without (10.7.9), does in fact converge to A −1 in N steps, if f is a quadratic form. Here now is the routine dfpmin that implements the quasi-Newton method, and uses lnsrch from §9.7. As mentioned at the end of newt in §9.7, this algorithm can fail if your variables are badly scaled. #include #include "nrutil.h" #define ITMAX 200 Maximum allowed number of iterations. #define EPS 3.0e-8 Machine precision. #define TOLX (4*EPS) Convergence criterion on x values. #define STPMX 100.0 Scaled maximum step length allowed in line searches. #define FREEALL free_vector(xi,1,n);free_vector(pnew,1,n); \ free_matrix(hessin,1,n,1,n);free_vector(hdg,1,n);free_vector(g,1,n); \ free_vector(dg,1,n); void dfpmin(float p[], int n, float gtol, int *iter, float *fret, float(*func)(float []), void (*dfunc)(float [], float [])) Given a starting point p[1..n] that is a vector of length n, the Broyden-Fletcher-GoldfarbShanno variant of Davidon-Fletcher-Powell minimization is performed on a function func, using its gradient as calculated by a routine dfunc. The convergence requirement on zeroing the gradient is input as gtol. Returned quantities are p[1..n] (the location of the minimum), iter (the number of iterations that were performed), and fret (the minimum value of the function). The routine lnsrch is called to perform approximate line minimizations. { void lnsrch(int n, float xold[], float fold, float g[], float p[], float x[], float *f, float stpmax, int *check, float (*func)(float [])); int check,i,its,j; float den,fac,fad,fae,fp,stpmax,sum=0.0,sumdg,sumxi,temp,test; float *dg,*g,*hdg,**hessin,*pnew,*xi; dg=vector(1,n); g=vector(1,n); hdg=vector(1,n); hessin=matrix(1,n,1,n); pnew=vector(1,n); xi=vector(1,n); fp=(*func)(p); Calculate starting function value and gra- (*dfunc)(p,g); dient, for (i=1;i<=n;i++) { and initialize the inverse Hessian to the for (j=1;j<=n;j++) hessin[i][j]=0.0; unit matrix. hessin[i][i]=1.0; xi[i] = -g[i]; Initial line direction. sum += p[i]*p[i]; } stpmax=STPMX*FMAX(sqrt(sum),(float)n);
10.7 Variable Metric Methods in Multidimensions 429 for (its-1;its<=ITMAX;its++){ Main loop over the iterations. *itersits; Insrch(n,p,fp,g,xi,pnew,fret,stpmax,&check,func); The new function evaluation occurs in Insrch;save the function value in fp for the next line search.It is usually safe to ignore the value of check. fp *fret; for (i=1;i<=n;i++){ xi[i]=pnew[i]-p[i]; Update the line direction p[i]=pnew[i]; and the current point. test=0.0; Test for convergence on Ax for(1=1:i<=n:i++)[ http://www.nr Permission is read able files temp=fabs(xi [i])/FMAX(fabs(p[i]),1.0); if (temp test)test=temp; if (test TOLX){ (including this one) 18881992 FREEALL return; 11-600 for (i=1;i<=n;i++)dg[i]=g[i]; Save the old gradient, (*dfunc)(p,g); and get the new gradient. to any Cambridge test=0.0; Test for convergence on zero gradient from NUMERICAL RECIPES IN den=FMAX(*fret,1.0); for (i=1;i<=n;i++){ to make temp=fabs(g[i])*FMAX(fabs(p[i]),1.0)/den; if (temp test)test=temp; America server computer, (Nor one paper UnN电.t THE if (test gtol){ ART FREEALL return: Programs for (i=1;i<=n;i++)dg[i]-g[i]-dg[i]; Compute difference of gradients, for (i=1;i<=n;i++){ and difference times current matrix. hdg[i]=0.0; strictly prohibited for (j=1;j<=n;j++)hdg[i]+hessin[i][j]*dg[j]; to dir fac-fae=sumdg=sumxi=0.0; Calculate dot products for the denomi- for(i=1;i<=n;i++)[ nators. fac +dg[i]*xi[i]; 188819920 OF SCIENTIFIC COMPUTING(ISBN fae +dg[i]*hdg[i]; sumdg +SQR(dg[i]); sumxi +SQR(xi[i]); 2 v@cam 10-621 if (fac sqrt(EPS*sumdg*sumxi)){ Skip update if fac not sufficiently posi- fac=1.0/fac; tive. fad=1.0/fae; Further reproduction. Numerical Recipes -43108 The vector that makes BFGS different from DFP: for (i=1;i<=n;i++)dg[i]=fac*xi[i]-fad*hdg[i]; 10r (i=1;i<=n;i++){ The BFGS updating formula: (outside for(j=1;j<=n;j++){ 膜 hessin[i][i]fac*xi[i]*xi[i] Software. -fad*hdg[i]*hdg[j]+fae*dg[i]*dg[j]; hessin[j][i]-hessin[i][j]; for (i=1;i<=n;i++){ Now calculate the next direction to go. x1[i]=0.0; for (j=1;j<=n;j++)xi[i]-hessin[i][j]*g[j]; and go back for another iteration nrerror("too many iterations in dfpmin"); FREEALL
10.7 Variable Metric Methods in Multidimensions 429 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). for (its=1;its test) test=temp; } if (test test) test=temp; } if (test sqrt(EPS*sumdg*sumxi)) { Skip update if fac not sufficiently posifac=1.0/fac; tive. fad=1.0/fae; The vector that makes BFGS different from DFP: for (i=1;i<=n;i++) dg[i]=fac*xi[i]-fad*hdg[i]; for (i=1;i<=n;i++) { The BFGS updating formula: for (j=i;j<=n;j++) { hessin[i][j] += fac*xi[i]*xi[j] -fad*hdg[i]*hdg[j]+fae*dg[i]*dg[j]; hessin[j][i]=hessin[i][j]; } } } for (i=1;i<=n;i++) { Now calculate the next direction to go, xi[i]=0.0; for (j=1;j<=n;j++) xi[i] -= hessin[i][j]*g[j]; } } and go back for another iteration. nrerror("too many iterations in dfpmin"); FREEALL }
430 Chapter 10.Minimization or Maximization of Functions Quasi-Newton methods like dfpmin work well with the approximate line minimization done by Insrch.The routines powell(810.5)and frprmn($10.6), however,need more accurate line minimization,which is carried out by the routine linmin Advanced Implementations of Variable Metric Methods Although rare,it can conceivably happen that roundoff errors cause the matrix Hi to become nearly singular or non-positive-definite.This can be serious,because the supposed search directions might then not lead downhill,and because nearly singular Hi's tend to give subsequent Hi's that are also nearly singular. There is a simple fix for this rare problem,the same as was mentioned in 810.4:In case of any doubt,you should restart the algorithm at the claimed minimum point,and see if it goes anywhere.Simple,but not very elegant.Modern implementations of variable metric methods deal with the problem in a more sophisticated way. Instead of building up an approximation to A,it is possible to build up an approximation of A itself.Then,instead of calculating the left-hand side of (10.7.4)directly,one solves the set of linear equations 华 A·(x-x)=-7fx) (10.7.11) 2 At first glance this seems like a bad idea,since solving (10.7.11)is a process of order N3-and anyway,how does this help the roundoff problem?The trick is not to store A but rather a triangular decomposition of A,its Cholesky decomposition (cf.$2.9).The updating Press. formula used for the Cholesky decomposition of A is of order N2 and can be arranged to guarantee that the matrix remains positive definite and nonsingular,even in the presence of finite roundoff.This method is due to Gill and Murray [1,2]. CITED REFERENCES AND FURTHER READING: OF SCIENTIFIC( Dennis,J.E.,and Schnabel,R.B.1983.Numerica/Methods for Unconstrained Optimization and Nonlinear Equations (Englewood Cliffs,NJ:Prentice-Hall).[1] 6 Jacobs,D.A.H.(ed.)1977,The State of the Art in Numerical Analysis(London:Academic Press), Chapter Ill.1,883-6 (by K.W.Brodlie).[2] Polak,E.1971,Computational Methods in Optimization (New York:Academic Press),pp.56ff.[3] 最 Acton,F.S.1970,Numerica/Methods That Work,1990,corrected edition (Washington:Mathe- matical Association of America),pp.467-468. Fuunrgroirion Numerical Recipes 10-621 43106 10.8 Linear Programming and the Simplex (outside Method North Software. The subject of linear programming,sometimes called linear optimization, Ame concerns itself with the following problem:For N independent variables1,...,N, maximize the function 2=a01E1+a02r2+···+a0NxN (10.8.1) subject to the primary constraints x1≥0,2≥0,.xN≥0 (10.8.2)
430 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Quasi-Newton methods like dfpmin work well with the approximate line minimization done by lnsrch. The routines powell (§10.5) and frprmn (§10.6), however, need more accurate line minimization, which is carried out by the routine linmin. Advanced Implementations of Variable Metric Methods Although rare, it can conceivably happen that roundoff errors cause the matrix Hi to become nearly singular or non-positive-definite. This can be serious, because the supposed search directions might then not lead downhill, and because nearly singular Hi’s tend to give subsequent Hi’s that are also nearly singular. There is a simple fix for this rare problem, the same as was mentioned in §10.4: In case of any doubt, you should restart the algorithm at the claimed minimum point, and see if it goes anywhere. Simple, but not very elegant. Modern implementations of variable metric methods deal with the problem in a more sophisticated way. Instead of building up an approximation to A−1, it is possible to build up an approximation of A itself. Then, instead of calculating the left-hand side of (10.7.4) directly, one solves the set of linear equations A · (x − xi) = −∇f(xi) (10.7.11) At first glance this seems like a bad idea, since solving (10.7.11) is a process of order N3 — and anyway, how does this help the roundoff problem? The trick is not to store A but rather a triangular decomposition of A, its Cholesky decomposition (cf. §2.9). The updating formula used for the Cholesky decomposition of A is of order N2 and can be arranged to guarantee that the matrix remains positive definite and nonsingular, even in the presence of finite roundoff. This method is due to Gill and Murray [1,2]. CITED REFERENCES AND FURTHER READING: Dennis, J.E., and Schnabel, R.B. 1983, Numerical Methods for Unconstrained Optimization and Nonlinear Equations (Englewood Cliffs, NJ: Prentice-Hall). [1] Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press), Chapter III.1, §§3–6 (by K. W. Brodlie). [2] Polak, E. 1971, Computational Methods in Optimization (New York: Academic Press), pp. 56ff. [3] Acton, F.S. 1970, Numerical Methods That Work; 1990, corrected edition (Washington: Mathematical Association of America), pp. 467–468. 10.8 Linear Programming and the Simplex Method The subject of linear programming, sometimes called linear optimization, concerns itself with the following problem: For N independent variables x1,...,xN , maximize the function z = a01x1 + a02x2 + ··· + a0N xN (10.8.1) subject to the primary constraints x1 ≥ 0, x2 ≥ 0, ... xN ≥ 0 (10.8.2)