412 Chapter 10. Minimization or Maximization of Functions if(11=11o)[ for (j=1;j<=ndim;j++) p[i][j]=psum[j]=0.5*(p[i][j]+p[i1o][j]); y[i]=(*funk)(psum); *nfunk +ndim; Keep track of function evaluations GET_PSUM Recompute psum else --(*nfunk); Correct the evaluation count. Go back for the test of doneness and the next free_vector(psum,1,ndim); iteration granted for #include "nrutil.h" -200 float amotry(float *p,float y[],float psum[],int ndim, float (*funk)(float [])int ihi,float fac) Extrapolates by a factor fac through the face of the simplex across from the high point,tries from NUMERICAL RECIPES I 19881992 it,and replaces the high point if the new point is better. int ji (North University Press. 令 float faci,fac2,ytry,*ptryi ptry=vector(1,ndim); America computer, fac1=(1.0-fac)/ndim; one paper THE ART fac2=faci-fac; for (j=1;j<=ndim;j++)ptry[j]=psum[j]*fac1-p[ihi][j]*fac2; ytry=(*funk)(ptry); Evaluate the function at the trial point. Programs if (ytry y[ihi]){ If it's better than the highest,then replace the highest. ictly proh y[ihi]=ytry; for (j=1;j<=ndim;j++){ psum[j]ptry[j]-p[ihi][j]; p[ihi][i]=ptry[j]; to dir free_vector(ptry,1,ndim); rectcustsen OF SCIENTIFIC COMPUTING(ISBN return ytry; @cambridge.org 1988-1992 by Numerical Recipes 10-:6211 CITED REFERENCES AND FURTHER READING: 43108-5 Nelder,J.A.,and Mead,R.1965,Computer Journal,vol.7,pp.308-313.[1] Yarbro.L.A.,and Deming,S.N.1974,Ana/ytica Chimica Acta,vol.73,pp.391-398. (outside Jacoby,S.L.S,Kowalik,J.S.,and Pizzo,J.T.1972,Iterative Methods for Nonlinear Optimization Software. Problems (Englewood Cliffs,NJ:Prentice-Hall) ying of 10.5 Direction Set (Powell's)Methods in Multidimensions We know (810.1-810.3)how to minimize a function of one variable.If we start at a point P in N-dimensional space,and proceed from there in some vector
412 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). if (i != ilo) { for (j=1;j<=ndim;j++) p[i][j]=psum[j]=0.5*(p[i][j]+p[ilo][j]); y[i]=(*funk)(psum); } } *nfunk += ndim; Keep track of function evaluations. GET_PSUM Recompute psum. } } else --(*nfunk); Correct the evaluation count. } Go back for the test of doneness and the next free_vector(psum,1,ndim); iteration. } #include "nrutil.h" float amotry(float **p, float y[], float psum[], int ndim, float (*funk)(float []), int ihi, float fac) Extrapolates by a factor fac through the face of the simplex across from the high point, tries it, and replaces the high point if the new point is better. { int j; float fac1,fac2,ytry,*ptry; ptry=vector(1,ndim); fac1=(1.0-fac)/ndim; fac2=fac1-fac; for (j=1;j<=ndim;j++) ptry[j]=psum[j]*fac1-p[ihi][j]*fac2; ytry=(*funk)(ptry); Evaluate the function at the trial point. if (ytry < y[ihi]) { If it’s better than the highest, then replace the highest. y[ihi]=ytry; for (j=1;j<=ndim;j++) { psum[j] += ptry[j]-p[ihi][j]; p[ihi][j]=ptry[j]; } } free_vector(ptry,1,ndim); return ytry; } CITED REFERENCES AND FURTHER READING: Nelder, J.A., and Mead, R. 1965, Computer Journal, vol. 7, pp. 308–313. [1] Yarbro, L.A., and Deming, S.N. 1974, Analytica Chimica Acta, vol. 73, pp. 391–398. Jacoby, S.L.S, Kowalik, J.S., and Pizzo, J.T. 1972, Iterative Methods for Nonlinear Optimization Problems (Englewood Cliffs, NJ: Prentice-Hall). 10.5 Direction Set (Powell’s) Methods in Multidimensions We know (§10.1–§10.3) how to minimize a function of one variable. If we start at a point P in N-dimensional space, and proceed from there in some vector
10.5 Direction Set(Powell's)Methods in Multidimensions 413 direction n,then any function of N variables f(P)can be minimized along the line n by our one-dimensional methods.One can dream up various multidimensional minimization methods that consist of sequences of such line minimizations.Different methods will differ only by how,at each stage,they choose the next direction n to try.All such methods presume the existence of a"black-box"sub-algorithm,which we might call linmin (given as an explicit routine at the end of this section).whose definition can be taken for now as linmin:Given as input the vectors P and n,and the function f,find the scalar A that minimizes f(P+An). 三 Replace P by P+An.Replace n by An.Done. g All the minimization methods in this section and in the two sections following g fall under this general schema of successive line minimizations.(The algorithm in $10.7 does not need very accurate line minimizations.Accordingly,it has its own approximate line minimization routine,Insrch.)In this section we consider a class of methods whose choice of successive directions does not involve explicit computation of the function's gradient;the next two sections do require such gradient 9 calculations.You will note that we need not specify whether linmin uses gradient information or not.That choice is up to you,and its optimization depends on your particular function.You would be crazy,however,to use gradients in linmin and not use them in the choice of directions,since in this latter role they can drastically reduce the total computational burden. But what if,in your application,calculation of the gradient is out of the question. 、a之 You might first think of this simple method:Take the unit vectors e1,e2,...en as a set of directions.Using linmin,move along the first direction to its minimum,then from there along the second direction to its minimum,and so on,cycling through the 6 whole set of directions as many times as necessary,until the function stops decreasing. This simple method is actually not too bad for many functions.Even more interesting is why it is bad,i.e.very inefficient,for some other functions.Consider a function of two dimensions whose contour map(level lines)happens to define a long,narrow valley at some angle to the coordinate basis vectors(see Figure 10.5.1). 10621 Then the only way "down the length of the valley"going along the basis vectors at each stage is by a series of many tiny steps.More generally,in N dimensions,if Numerical Recipes 43106 the function's second derivatives are much larger in magnitude in some directions than in others,then many cycles through all N basis vectors will be required in order to get anywhere.This condition is not all that unusual;according to Murphy's (outside Law,you should count on it. North Obviously what we need is a better set of directions than the ei's.All direction set methods consist of prescriptions for updating the set of directions as the method proceeds,attempting to come up with a set which either(i)includes some very good directions that will take us far along narrow valleys,or else (more subtly) (ii)includes some number of"non-interfering"directions with the special property that minimization along one is not "spoiled"by subsequent minimization along another,so that interminable cycling through the set of directions can be avoided
10.5 Direction Set (Powell’s) Methods in Multidimensions 413 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). direction n, then any function of N variables f(P) can be minimized along the line n by our one-dimensional methods. One can dream up various multidimensional minimization methods that consist of sequences of such line minimizations. Different methods will differ only by how, at each stage, they choose the next direction n to try. All such methods presume the existence of a “black-box” sub-algorithm, which we might call linmin (given as an explicit routine at the end of this section), whose definition can be taken for now as linmin: Given as input the vectors P and n, and the function f, find the scalar λ that minimizes f(P+λn). Replace P by P + λn. Replace n by λn. Done. All the minimization methods in this section and in the two sections following fall under this general schema of successive line minimizations. (The algorithm in §10.7 does not need very accurate line minimizations. Accordingly, it has its own approximate line minimization routine, lnsrch.) In this section we consider a class of methods whose choice of successive directions does not involve explicit computation of the function’s gradient; the next two sections do require such gradient calculations. You will note that we need not specify whether linmin uses gradient information or not. That choice is up to you, and its optimization depends on your particular function. You would be crazy, however, to use gradients in linmin and not use them in the choice of directions, since in this latter role they can drastically reduce the total computational burden. But what if, in your application, calculation of the gradient is out of the question. You might first think of this simple method: Take the unit vectors e 1, e2,... eN as a set of directions. Using linmin, move along the first direction to its minimum, then from there along the second direction to its minimum, and so on, cycling through the whole set of directions as many times as necessary, until the function stops decreasing. This simple method is actually not too bad for many functions. Even more interesting is why it is bad, i.e. very inefficient, for some other functions. Consider a function of two dimensions whose contour map (level lines) happens to define a long, narrow valley at some angle to the coordinate basis vectors (see Figure 10.5.1). Then the only way “down the length of the valley” going along the basis vectors at each stage is by a series of many tiny steps. More generally, in N dimensions, if the function’s second derivatives are much larger in magnitude in some directions than in others, then many cycles through all N basis vectors will be required in order to get anywhere. This condition is not all that unusual; according to Murphy’s Law, you should count on it. Obviously what we need is a better set of directions than the ei’s. All direction set methods consist of prescriptions for updating the set of directions as the method proceeds, attempting to come up with a set which either (i) includes some very good directions that will take us far along narrow valleys, or else (more subtly) (ii) includes some number of “non-interfering” directions with the special property that minimization along one is not “spoiled” by subsequent minimization along another, so that interminable cycling through the set of directions can be avoided
414 Chapter 10. Minimization or Maximization of Functions Start http://www.nr. read able files .com or call granted for 18881992 11-800-872 (including this one) /Cambridge -7423(North America to any server computer,is users to make one paper from NUMERICAL RECIPES IN C: SUNN电.et THE 是 send copy for their strictly prohibited Programs Figure 10.5.1.Successive minimizations along coordinate directions in a long,narrow "valley"(shown as contour lines).Unless the valley is optimally oriented,this method is extremely inefficient,taking many tiny steps to get to the minimum,crossing and re-crossing the principal axis. to dir Copyright (C) Conjugate Directions ectcustser 18881920 ART OF SCIENTIFIC COMPUTING(ISBN This concept of"non-interfering"directions,more conventionally called con- jugate directions,is worth making mathematically explicit. v@cam First,note that if we minimize a function along some direction u,then the gradient of the function must be perpendicular to u at the line minimum;if not,then Further reproduction there would still be a nonzero directional derivative along u. Numerical Recipes 10-621 43108-5 Next take some particular point P as the origin of the coordinate system with coordinates x.Then any function f can be approximated by its Taylor series (outside of 41口2f North Software. f)=fP)+∑ 2 8x10xj j十·· 2 ij (10.5.1) 1 ≈c-b:+2XA visit website where c三f(P) b=-Vflp [A三 82f Ox:OzjP (10.5.2) The matrix A whose components are the second partial derivative matrix of the function is called the Hessian matrix of the function at P
414 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). start y x Figure 10.5.1. Successive minimizations along coordinate directions in a long, narrow “valley” (shown as contour lines). Unless the valley is optimally oriented, this method is extremely inefficient, taking many tiny steps to get to the minimum, crossing and re-crossing the principal axis. Conjugate Directions This concept of “non-interfering” directions, more conventionally called conjugate directions, is worth making mathematically explicit. First, note that if we minimize a function along some direction u, then the gradient of the function must be perpendicular to u at the line minimum; if not, then there would still be a nonzero directional derivative along u. Next take some particular point P as the origin of the coordinate system with coordinates x. Then any function f can be approximated by its Taylor series f(x) = f(P) + i ∂f ∂xi xi + 1 2 i,j ∂2f ∂xi∂xj xixj + ··· ≈ c − b · x + 1 2 x · A · x (10.5.1) where c ≡ f(P) b ≡ −∇f| P [A]ij ≡ ∂2f ∂xi∂xj P (10.5.2) The matrix A whose components are the second partial derivative matrix of the function is called the Hessian matrix of the function at P.
10.5 Direction Set(Powell's)Methods in Multidimensions 415 In the approximation of(10.5.1),the gradient of f is easily calculated as Vf=A·x-b (10.53) (This implies that the gradient will vanish-the function will be at an extremum- at a value of x obtained by solving A.x=b.This idea we will return to in 810.7!) How does the gradient Vf change as we move along some direction?Evidently 6(7f)=A·(6x) (10.5.4) Suppose that we have moved along some direction u to a minimum and now propose to move along some new direction v.The condition that motion along v not spoil our minimization along u is just that the gradient stay perpendicular to u,i.e., that the change in the gradient be perpendicular to u.By equation(10.5.4)this is just 0=u·6(7f)=u·A·v (10.5.5) When (10.5.5)holds for two vectors u and v,they are said to be conjugate. 2 When the relation holds pairwise for all members of a set of vectors,they are said to be a conjugate set.If you do successive line minimization of a function along a conjugate set of directions,then you don't need to redo any of those directions (unless,of course,you spoil things by minimizing along a direction that they are not conjugate to). 县气%∽ 9 A triumph for a direction set method is to come up with a set of N linearly independent,mutually conjugate directions.Then,one pass of N line minimizations OF SCIENTIFIC will put it exactly at the minimum of a quadratic form like (10.5.1).For functions f that are not exactly quadratic forms,it won't be exactly at the minimum;but 61 repeated cycles of N line minimizations will in due course converge quadratically to the minimum Powell's Quadratically Convergent Method Powell first discovered a direction set method that does produce N mutually Numerica 10621 conjugate directions.Here is how it goes:Initialize the set of directions u;to the basis vectors, 431 Recipes ui=ei i=1,...,N (10.5.6) (outside Now repeat the following sequence of steps("basic procedure")until your function North stops decreasing: Save your starting position as Po. For i=1,...,N,move Pi-1 to the minimum along direction ui and call this point Pi. ·Fori=1,,N-1,set ui←ui+1 ·Set un-Pw-Po. Move Pn to the minimum along direction un and call this point Po
10.5 Direction Set (Powell’s) Methods in Multidimensions 415 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). In the approximation of (10.5.1), the gradient of f is easily calculated as ∇f = A · x − b (10.5.3) (This implies that the gradient will vanish — the function will be at an extremum — at a value of x obtained by solving A · x = b. This idea we will return to in §10.7!) How does the gradient ∇f change as we move along some direction? Evidently δ(∇f) = A · (δx) (10.5.4) Suppose that we have moved along some direction u to a minimum and now propose to move along some new direction v. The condition that motion along v not spoil our minimization along u is just that the gradient stay perpendicular to u, i.e., that the change in the gradient be perpendicular to u. By equation (10.5.4) this is just 0 = u · δ(∇f) = u · A · v (10.5.5) When (10.5.5) holds for two vectors u and v, they are said to be conjugate. When the relation holds pairwise for all members of a set of vectors, they are said to be a conjugate set. If you do successive line minimization of a function along a conjugate set of directions, then you don’t need to redo any of those directions (unless, of course, you spoil things by minimizing along a direction that they are not conjugate to). A triumph for a direction set method is to come up with a set of N linearly independent, mutually conjugate directions. Then, one pass of N line minimizations will put it exactly at the minimum of a quadratic form like (10.5.1). For functions f that are not exactly quadratic forms, it won’t be exactly at the minimum; but repeated cycles of N line minimizations will in due course converge quadratically to the minimum. Powell’s Quadratically Convergent Method Powell first discovered a direction set method that does produce N mutually conjugate directions. Here is how it goes: Initialize the set of directions ui to the basis vectors, ui = ei i = 1,...,N (10.5.6) Now repeat the following sequence of steps (“basic procedure”) until your function stops decreasing: • Save your starting position as P0. • For i = 1,...,N, move Pi−1 to the minimum along direction ui and call this point Pi. • For i = 1,...,N − 1, set ui ← ui+1. • Set uN ← PN − P0. • Move PN to the minimum along direction uN and call this point P0
416 Chapter 10.Minimization or Maximization of Functions Powell,in 1964,showed that,for a quadratic form like (10.5.1).iterations of the above basic procedure produce a set of directions ui whose last k members are mutually conjugate.Therefore,N iterations of the basic procedure,amounting to N(N+1)line minimizations in all,will exactly minimize a quadratic form. Brent [1]gives proofs of these statements in accessible form. Unfortunately,there is a problem with Powell's quadratically convergent al- gorithm.The procedure of throwing away,at each stage,u in favor of P-Po tends to produce sets of directions that"fold up on each other"and become linearly dependent.Once this happens,then the procedure finds the minimum of the function f only over a subspace of the full N-dimensional case;in other words,it gives the 81 wrong answer.Therefore,the algorithm must not be used in the form given above. There are a number of ways to fix up the problem of linear dependence in Powell's algorithm,among them: 1.You can reinitialize the set of directions u;to the basis vectors e;after every N or N+1 iterations of the basic procedure.This produces a serviceable method, which we commend to you if quadratic convergence is important for your application (i.e.,if your functions are close to quadratic forms and if you desire high accuracy). 2.Brent points out that the set of directions can equally well be reset to the columns of any orthogonal matrix.Rather than throw away the information 9 on conjugate directions already built up,he resets the direction set to calculated principal directions of the matrix A (which he gives a procedure for determining) The calculation is essentially a singular value decomposition algorithm (see 82.6) Brent has a number of other cute tricks up his sleeve,and his modification of Powell's method is probably the best presently known.Consult [1]for a detailed 苔2ps description and listing of the program.Unfortunately it is rather too elaborate for OF SCIENTIFIC us to include here. 3.You can give up the property of quadratic convergence in favor of a more 6 heuristic scheme (due to Powell)which tries to find a few good directions along narrow valleys instead of N necessarily conjugate directions.This is the method that we now implement.(It is also the version of Powell's method given in Acton [21, from which parts of the following discussion are drawn.) Numerica 10621 Discarding the Direction of Largest Decrease 431 The fox and the grapes:Now that we are going to give up the property of Recipes quadratic convergence,was it so important after all?That depends on the function (outside that you are minimizing.Some applications produce functions with long,twisty valleys.Quadratic convergence is of no particular advantage to a program which North must slalom down the length of a valley floor that twists one way and another (and another,and another,...-there are N dimensions!).Along the long direction, a quadratically convergent method is trying to extrapolate to the minimum of a parabola which just isn't(yet)there;while the conjugacy of the N-1 transverse directions keeps getting spoiled by the twists. Sooner or later,however,we do arrive at an approximately ellipsoidal minimum (cf.equation 10.5.1 when b,the gradient,is zero).Then,depending on how much accuracy we require,a method with quadratic convergence can save us several times N2 extra line minimizations,since quadratic convergence doubles the number of significant figures at each iteration
416 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Powell, in 1964, showed that, for a quadratic form like (10.5.1), k iterations of the above basic procedure produce a set of directions ui whose last k members are mutually conjugate. Therefore, N iterations of the basic procedure, amounting to N(N + 1) line minimizations in all, will exactly minimize a quadratic form. Brent [1] gives proofs of these statements in accessible form. Unfortunately, there is a problem with Powell’s quadratically convergent algorithm. The procedure of throwing away, at each stage, u 1 in favor of PN − P0 tends to produce sets of directions that “fold up on each other” and become linearly dependent. Once this happens, then the procedure finds the minimum of the function f only over a subspace of the full N-dimensional case; in other words, it gives the wrong answer. Therefore, the algorithm must not be used in the form given above. There are a number of ways to fix up the problem of linear dependence in Powell’s algorithm, among them: 1. You can reinitialize the set of directions ui to the basis vectors ei after every N or N + 1 iterations of the basic procedure. This produces a serviceable method, which we commend to you if quadratic convergence is important for your application (i.e., if your functions are close to quadratic forms and if you desire high accuracy). 2. Brent points out that the set of directions can equally well be reset to the columns of any orthogonal matrix. Rather than throw away the information on conjugate directions already built up, he resets the direction set to calculated principal directions of the matrix A (which he gives a procedure for determining). The calculation is essentially a singular value decomposition algorithm (see §2.6). Brent has a number of other cute tricks up his sleeve, and his modification of Powell’s method is probably the best presently known. Consult [1] for a detailed description and listing of the program. Unfortunately it is rather too elaborate for us to include here. 3. You can give up the property of quadratic convergence in favor of a more heuristic scheme (due to Powell) which tries to find a few good directions along narrow valleys instead of N necessarily conjugate directions. This is the method that we now implement. (It is also the version of Powell’s method given in Acton [2], from which parts of the following discussion are drawn.) Discarding the Direction of Largest Decrease The fox and the grapes: Now that we are going to give up the property of quadratic convergence, was it so important after all? That depends on the function that you are minimizing. Some applications produce functions with long, twisty valleys. Quadratic convergence is of no particular advantage to a program which must slalom down the length of a valley floor that twists one way and another (and another, and another, ... – there are N dimensions!). Along the long direction, a quadratically convergent method is trying to extrapolate to the minimum of a parabola which just isn’t (yet) there; while the conjugacy of the N − 1 transverse directions keeps getting spoiled by the twists. Sooner or later, however, we do arrive at an approximately ellipsoidal minimum (cf. equation 10.5.1 when b, the gradient, is zero). Then, depending on how much accuracy we require, a method with quadratic convergence can save us several times N2 extra line minimizations, since quadratic convergence doubles the number of significant figures at each iteration
10.5 Direction Set(Powell's)Methods in Multidimensions 417 The basic idea of our now-modified Powell's method is still to take P N-Po as a new direction;it is,after all,the average direction moved after trying all N possible directions.For a valley whose long direction is twisting slowly,this direction is likely to give us a good run along the new long direction.The change is to discard the old direction along which the function f made its largest decrease.This seems paradoxical,since that direction was the best of the previous iteration.However,it is also likely to be a major component of the new direction that we are adding,so dropping it gives us the best chance of avoiding a buildup of linear dependence. There are a couple of exceptions to this basic idea.Sometimes it is better not to add a new direction at all.Define fo三f(Po) fN≡f(PN) fE≡f(2PN-Po) (10.5.7) Here fe is the function value at an "extrapolated"point somewhat further along the proposed new direction.Also define Af to be the magnitude of the largest decrease along one particular direction of the present basic procedure iteration.(Af 、2 is a positive number.Then: 1.If fe fo,then keep the old set of directions for the next basic procedure, 令 because the average direction PN-Po is all played out. 2.If2(fo-2fv+fE)[(fn-fw)-△f2≥(fo-fE)2△f,then keep the old set of directions for the next basic procedure,because either(i)the decrease along 。 Press. the average direction was not primarily due to any single direction's decrease,or (ii) there is a substantial second derivative along the average direction and we seem to 9 be near to the bottom of its minimum. The following routine implements Powell's method in the version just described. OF SCIENTIFIC In the routine,xi is the matrix whose columns are the set of directions n::otherwise the correspondence of notation should be self-evident. 6 #include #include "nrutil.h" #define TINY 1.0e-25 A small number. COMPUTING (ISBN 19891892 #define ITMAX 200 Maximum allowed iterations. void powell(float p[],float **xi,int n,float ftol,int iter,float *fret, uurggoglrion Numerical Recipes 10-621 float (*func)(float [) Minimization of a function func of n variables.Input consists of an initial starting point p[1..n];an initial matrix xi[1..n][1..n],whose columns contain the initial set of di- 43108 rections (usually the n unit vectors);and ftol,the fractional tolerance in the function value such that failure to decrease by more than this amount on one iteration signals doneness.On output,p is set to the best point found,xi is the then-current direction set,fret is the returned (outside function value at p,and iter is the number of iterations taken.The routine linmin is used. North Software. void linmin(float p],float xi,int n,float *fret, float (*func)(float ])) int i,ibig,j; float del,fp,fptt,t,*pt,*ptt,*xit; pt-vector(1,n); ptt=vector(1,n): xit-vector(1,n); *fret=(*func)(p); for (j=1;j<=n;j++)pt[j]=p[j]; Save the initial point. for (iter=1;;++(*iter)){ fp=(*fret); 1b1g=0;
10.5 Direction Set (Powell’s) Methods in Multidimensions 417 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). The basic idea of our now-modified Powell’s method is still to take P N − P0 as a new direction; it is, after all, the average direction moved after trying all N possible directions. For a valley whose long direction is twisting slowly, this direction is likely to give us a good run along the new long direction. The change is to discard the old direction along which the function f made its largest decrease. This seems paradoxical, since that direction was the best of the previous iteration. However, it is also likely to be a major component of the new direction that we are adding, so dropping it gives us the best chance of avoiding a buildup of linear dependence. There are a couple of exceptions to this basic idea. Sometimes it is better not to add a new direction at all. Define f0 ≡ f(P0) fN ≡ f(PN ) fE ≡ f(2PN − P0) (10.5.7) Here fE is the function value at an “extrapolated” point somewhat further along the proposed new direction. Also define ∆f to be the magnitude of the largest decrease along one particular direction of the present basic procedure iteration. (∆f is a positive number.) Then: 1. If fE ≥ f0, then keep the old set of directions for the next basic procedure, because the average direction PN − P0 is all played out. 2. If 2 (f0 − 2fN + fE) [(f0 − fN )− ∆f] 2 ≥ (f0 − fE)2∆f, then keep the old set of directions for the next basic procedure, because either (i) the decrease along the average direction was not primarily due to any single direction’s decrease, or (ii) there is a substantial second derivative along the average direction and we seem to be near to the bottom of its minimum. The following routine implements Powell’s method in the version just described. In the routine, xi is the matrix whose columns are the set of directions ni; otherwise the correspondence of notation should be self-evident. #include #include "nrutil.h" #define TINY 1.0e-25 A small number. #define ITMAX 200 Maximum allowed iterations. void powell(float p[], float **xi, int n, float ftol, int *iter, float *fret, float (*func)(float [])) Minimization of a function func of n variables. Input consists of an initial starting point p[1..n]; an initial matrix xi[1..n][1..n], whose columns contain the initial set of directions (usually the n unit vectors); and ftol, the fractional tolerance in the function value such that failure to decrease by more than this amount on one iteration signals doneness. On output, p is set to the best point found, xi is the then-current direction set, fret is the returned function value at p, and iter is the number of iterations taken. The routine linmin is used. { void linmin(float p[], float xi[], int n, float *fret, float (*func)(float [])); int i,ibig,j; float del,fp,fptt,t,*pt,*ptt,*xit; pt=vector(1,n); ptt=vector(1,n); xit=vector(1,n); *fret=(*func)(p); for (j=1;j<=n;j++) pt[j]=p[j]; Save the initial point. for (*iter=1;;++(*iter)) { fp=(*fret); ibig=0;
418 Chapter 10. Minimization or Maximization of Functions de1=0.0; Will be the biggest function decrease for(1=1;idel){ and record it if it is the largest decrease del=fptt-(*fret); so far. ibig=i; 2 if (2.0*(fp-(*fret))<=ftol*(fabs(fp)+fabs(*fret))+TINY) free_vector(xit,1,n); lermination criterion free_vector(ptt,1,n); free_vector(pt,1,n); return; d 18881892 if (*iter =ITMAX)nrerror("powell exceeding maximum iterations."); for (j=1;j<=n;j++){ Construct the extrapolated point and the Cam NUMERICAL ptt[j]=2.0*p[j]-pt[j]; average direction moved.Save the xit[j]=p[j]-pt[j]; old starting point. pt[j]=p[j]; RECIPES fptt=(*func)(ptt); Function value at extrapolated point. it(fptt fp){ 入高 令 t=2.0*(fp-2.0*(*fret)+fptt)*SQR(fp-(*fret)-del)-del*SQR(fp-fptt); 1f(t<0.0)[ linmin(p,xit,n,fret,func); Move to the minimum of the new direc- Press. for (j=1;j<=n;j++){ tion,and save the new direction. xi[j][ibig]=xi[j][n]; xi[j][n]=xit[]; 22222 Back for another iteration. IENTIFIC MPUTING Implementation of Line Minimization 1920 (ISBN Make no mistake,there is a right way to implement linmin:It is to use 10.621 the methods of one-dimensional minimization described in $10.1-810.3,but to Numerica rewrite the programs of those sections so that their bookkeeping is done on vector- uction Recipes 43108 valued points P (all lying along a given direction n)rather than scalar-valued abscissas z.That straightforward task produces long routines densely populated with“for(k=1;k<=n;k++)”loops. We do not have space to include such routines in this book.Our linmin,which North Software. works just fine,is instead a kind of bookkeeping swindle.It constructs an"artificial" function of one variable called fidim,which is the value of your function,say, func,along the line going through the point p in the direction xi.linmin calls our familiar one-dimensional routines mnbrak($10.1)and brent ($10.3)and instructs them to minimize fidim.linmin communicates with f1dim"over the head"of mnbrak and brent,through global (external)variables.That is also how it passes to fidim a pointer to your user-supplied function. The only thing inefficient about linmin is this:Its use as an interface between a multidimensional minimization strategy and a one-dimensional minimization routine results in some unnecessary copying of vectors hither and yon.That should not
418 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). del=0.0; Will be the biggest function decrease. for (i=1;i del) { and record it if it is the largest decrease del=fptt-(*fret); so far. ibig=i; } } if (2.0*(fp-(*fret)) <= ftol*(fabs(fp)+fabs(*fret))+TINY) { free_vector(xit,1,n); Termination criterion. free_vector(ptt,1,n); free_vector(pt,1,n); return; } if (*iter == ITMAX) nrerror("powell exceeding maximum iterations."); for (j=1;j<=n;j++) { Construct the extrapolated point and the average direction moved. Save the old starting point. ptt[j]=2.0*p[j]-pt[j]; xit[j]=p[j]-pt[j]; pt[j]=p[j]; } fptt=(*func)(ptt); Function value at extrapolated point. if (fptt < fp) { t=2.0*(fp-2.0*(*fret)+fptt)*SQR(fp-(*fret)-del)-del*SQR(fp-fptt); if (t < 0.0) { linmin(p,xit,n,fret,func); Move to the minimum of the new direcfor (j=1;j<=n;j++) { tion, and save the new direction. xi[j][ibig]=xi[j][n]; xi[j][n]=xit[j]; } } } } Back for another iteration. } Implementation of Line Minimization Make no mistake, there is a right way to implement linmin: It is to use the methods of one-dimensional minimization described in §10.1–§10.3, but to rewrite the programs of those sections so that their bookkeeping is done on vectorvalued points P (all lying along a given direction n) rather than scalar-valued abscissas x. That straightforward task produces long routines densely populated with “for(k=1;k<=n;k++)” loops. We do not have space to include such routines in this book. Our linmin, which works just fine, is instead a kind of bookkeeping swindle. It constructs an “artificial” function of one variable called f1dim, which is the value of your function, say, func, along the line going through the point p in the direction xi. linmin calls our familiar one-dimensional routines mnbrak (§10.1) and brent (§10.3) and instructs them to minimize f1dim. linmin communicates with f1dim “over the head” of mnbrak and brent, through global (external) variables. That is also how it passes to f1dim a pointer to your user-supplied function. The only thing inefficient about linmin is this: Its use as an interface between a multidimensional minimization strategy and a one-dimensional minimization routine results in some unnecessary copying of vectors hither and yon. That should not
10.5 Direction Set(Powell's)Methods in Multidimensions 419 normally be a significant addition to the overall computational burden,but we cannot disguise its inelegance. #include "nrutil.h" #define TOL 2.0e-4 Tolerance passed to brent int ncom; Global variables communicate with f1dim float *pcom,*xicom,(*nrfunc)(float []) void linmin(float p[],float xi[],int n,float *fret,float (*func)(float []) Given an n-dimensional point p[1..n]and an n-dimensional direction xi[1..n],moves and http://www.nr. resets p to where the function func(p)takes on a minimum along the direction xi from p, and replaces xi by the actual vector displacement that p was moved.Also returns as fret the value of func at the returned location p.This is actually all accomplished by calling the 83g routines mnbrak and brent. 19881992 float brent(float ax,float bx,float cx, 1600 (including this one) float (*f)(float),float tol,float *xmin); float fidim(float x); 872 void mnbrak(float *ax,float *bx,float *cx,float *fa,float *fb, float *fc,float (*func)(float)); tusers to make one paper from NUMERICAL RECIPES IN int j; float xx,xmin,fx,fb,fa,bx,ax; ncom=n; Define the global variables 7423 (North America to any server computer,is strictly prohibited. by Cambridge University Press.Programs Copyright(C)1988-1992 by Numerical Recipes THE pcom=vector(1,n); xicom=vector(1,n); nrfunc=func; for (i=1;i<=n;i++){ pcom[i]=p[j]; xicom[i]-xi[i]; ax=0.0; Initial guess for brackets. xx=1.0; mnbrak(&ax,&xx,&bx,&fa,&fx,&fb,fidim); 车fretsbrent(a置.xX.bx.f1d5m.T0L,2m1n】 for(j=1;j<=n;j++){ Construct the vector results to return. xi[j]*xmin; p[j]+xi[j]; 2 free_vector(xicom,1,n); free_vector(pcom,1,n); only),orsend email to directcustserv@cambridge.org personal use.Further reproduction,or ART OF SCIENTIFIC COMPUTING(ISBN 0-521-43108-5) #include "nrutil.h" extern int ncom; Defined in linmin Software. extern float *pcom,*xicom,(*nrfunc)(float []) float fidim(float x) (outside North America) Must accompany linmin int j; float f,*xt; xt-vector(1,ncom); for (j=1;j<=ncom;j++)xt[j]=pcom[j]+x*xicom[j]; f=(*nrfunc)(xt); free_vector(xt,1,ncom); return f;
10.5 Direction Set (Powell’s) Methods in Multidimensions 419 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). normally be a significant addition to the overall computational burden, but we cannot disguise its inelegance. #include "nrutil.h" #define TOL 2.0e-4 Tolerance passed to brent. int ncom; Global variables communicate with f1dim. float *pcom,*xicom,(*nrfunc)(float []); void linmin(float p[], float xi[], int n, float *fret, float (*func)(float [])) Given an n-dimensional point p[1..n] and an n-dimensional direction xi[1..n], moves and resets p to where the function func(p) takes on a minimum along the direction xi from p, and replaces xi by the actual vector displacement that p was moved. Also returns as fret the value of func at the returned location p. This is actually all accomplished by calling the routines mnbrak and brent. { float brent(float ax, float bx, float cx, float (*f)(float), float tol, float *xmin); float f1dim(float x); void mnbrak(float *ax, float *bx, float *cx, float *fa, float *fb, float *fc, float (*func)(float)); int j; float xx,xmin,fx,fb,fa,bx,ax; ncom=n; Define the global variables. pcom=vector(1,n); xicom=vector(1,n); nrfunc=func; for (j=1;j<=n;j++) { pcom[j]=p[j]; xicom[j]=xi[j]; } ax=0.0; Initial guess for brackets. xx=1.0; mnbrak(&ax,&xx,&bx,&fa,&fx,&fb,f1dim); *fret=brent(ax,xx,bx,f1dim,TOL,&xmin); for (j=1;j<=n;j++) { Construct the vector results to return. xi[j] *= xmin; p[j] += xi[j]; } free_vector(xicom,1,n); free_vector(pcom,1,n); } #include "nrutil.h" extern int ncom; Defined in linmin. extern float *pcom,*xicom,(*nrfunc)(float []); float f1dim(float x) Must accompany linmin. { int j; float f,*xt; xt=vector(1,ncom); for (j=1;j<=ncom;j++) xt[j]=pcom[j]+x*xicom[j]; f=(*nrfunc)(xt); free_vector(xt,1,ncom); return f; }
420 Chapter 10.Minimization or Maximization of Functions CITED REFERENCES AND FURTHER READING: Brent,R.P.1973,Algorithms for Minimization without Derivatives(Englewood Cliffs,NJ:Prentice- Hall),Chapter 7.[1] Acton,F.S.1970,Numerica/Methods That Work;1990,corrected edition (Washington:Mathe- matical Association of America),pp.464-467.[2] Jacobs,D.A.H.(ed.)1977,The State of the Art in Numerical Analysis (London:Academic Press)),Pp.259-262. 10.6 Conjugate Gradient Methods in Cam ICAL Multidimensions We consider now the case where you are able to calculate,at a given N- 、 dimensional point P,not just the value of a function f(P)but also the gradient 9 (vector of first partial derivatives)Vf(P). A rough counting argument will show how advantageous it is to use the gradient information:Suppose that the function f is roughly approximated as a quadratic form,as above in equation (10.5.1), 、星是分 9 f国≈c-bI+2Ax (10.6.1) Then the number of unknown parameters in f is equal to the number of free parameters in A and b,which is N(N+1),which we see to be of order N2 Changing any one of these parameters can move the location of the minimum. Therefore.we should not expect to be able to find the minimum until we have 学三级 collected an equivalent information content,of order N2 numbers. In the direction set methods of 810.5,we collected the necessary information by making on the order of N2 separate line minimizations,each requiring"a few"(but 号名,a Numerica 10621 sometimes a big few!)function evaluations.Now,each evaluation of the gradient will bring us N new components of information.If we use them wisely,we should need to make only of order N separate line minimizations.That is in fact the case for the algorithms in this section and the next. A factor of N improvement in computational speed is not necessarily implied. As a rough estimate,we might imagine that the calculation of each component of the gradient takes about as long as evaluating the function itself.In that case there will be of order N2 equivalent function evaluations both with and without gradient information.Even if the advantage is not of order N,however,it is nevertheless quite substantial:(i)Each calculated component of the gradient will typically save not just one function evaluation,but a number of them,equivalent to,say,a whole line minimization.(ii)There is often a high degree of redundancy in the formulas for the various components of a function's gradient;when this is so,especially when there is also redundancy with the calculation of the function,then the calculation of the gradient may cost significantly less than N function evaluations
420 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). CITED REFERENCES AND FURTHER READING: Brent, R.P. 1973, Algorithms for Minimization without Derivatives (Englewood Cliffs, NJ: PrenticeHall), Chapter 7. [1] Acton, F.S. 1970, Numerical Methods That Work; 1990, corrected edition (Washington: Mathematical Association of America), pp. 464–467. [2] Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press), pp. 259–262. 10.6 Conjugate Gradient Methods in Multidimensions We consider now the case where you are able to calculate, at a given Ndimensional point P, not just the value of a function f(P) but also the gradient (vector of first partial derivatives) ∇f(P). A rough counting argument will show how advantageous it is to use the gradient information: Suppose that the function f is roughly approximated as a quadratic form, as above in equation (10.5.1), f(x) ≈ c − b · x + 1 2 x · A · x (10.6.1) Then the number of unknown parameters in f is equal to the number of free parameters in A and b, which is 1 2N(N + 1), which we see to be of order N 2. Changing any one of these parameters can move the location of the minimum. Therefore, we should not expect to be able to find the minimum until we have collected an equivalent information content, of order N 2 numbers. In the direction set methods of §10.5, we collected the necessary information by making on the order of N 2 separate line minimizations, each requiring “a few” (but sometimes a big few!) function evaluations. Now, each evaluation of the gradient will bring us N new components of information. If we use them wisely, we should need to make only of order N separate line minimizations. That is in fact the case for the algorithms in this section and the next. A factor of N improvement in computational speed is not necessarily implied. As a rough estimate, we might imagine that the calculation of each component of the gradient takes about as long as evaluating the function itself. In that case there will be of order N 2 equivalent function evaluations both with and without gradient information. Even if the advantage is not of order N, however, it is nevertheless quite substantial: (i) Each calculated component of the gradient will typically save not just one function evaluation, but a number of them, equivalent to, say, a whole line minimization. (ii) There is often a high degree of redundancy in the formulas for the various components of a function’s gradient; when this is so, especially when there is also redundancy with the calculation of the function, then the calculation of the gradient may cost significantly less than N function evaluations