正在加载图片...
420 Chapter 10.Minimization or Maximization of Functions CITED REFERENCES AND FURTHER READING: Brent,R.P.1973,Algorithms for Minimization without Derivatives(Englewood Cliffs,NJ:Prentice- Hall),Chapter 7.[1] Acton,F.S.1970,Numerica/Methods That Work;1990,corrected edition (Washington:Mathe- matical Association of America),pp.464-467.[2] Jacobs,D.A.H.(ed.)1977,The State of the Art in Numerical Analysis (London:Academic Press)),Pp.259-262. 10.6 Conjugate Gradient Methods in Cam ICAL Multidimensions We consider now the case where you are able to calculate,at a given N- 、 dimensional point P,not just the value of a function f(P)but also the gradient 9 (vector of first partial derivatives)Vf(P). A rough counting argument will show how advantageous it is to use the gradient information:Suppose that the function f is roughly approximated as a quadratic form,as above in equation (10.5.1), 、星是分 9 f国≈c-bI+2Ax (10.6.1) Then the number of unknown parameters in f is equal to the number of free parameters in A and b,which is N(N+1),which we see to be of order N2 Changing any one of these parameters can move the location of the minimum. Therefore.we should not expect to be able to find the minimum until we have 学三级 collected an equivalent information content,of order N2 numbers. In the direction set methods of 810.5,we collected the necessary information by making on the order of N2 separate line minimizations,each requiring"a few"(but 号名,a Numerica 10621 sometimes a big few!)function evaluations.Now,each evaluation of the gradient will bring us N new components of information.If we use them wisely,we should need to make only of order N separate line minimizations.That is in fact the case for the algorithms in this section and the next. A factor of N improvement in computational speed is not necessarily implied. As a rough estimate,we might imagine that the calculation of each component of the gradient takes about as long as evaluating the function itself.In that case there will be of order N2 equivalent function evaluations both with and without gradient information.Even if the advantage is not of order N,however,it is nevertheless quite substantial:(i)Each calculated component of the gradient will typically save not just one function evaluation,but a number of them,equivalent to,say,a whole line minimization.(ii)There is often a high degree of redundancy in the formulas for the various components of a function's gradient;when this is so,especially when there is also redundancy with the calculation of the function,then the calculation of the gradient may cost significantly less than N function evaluations.420 Chapter 10. Minimization or Maximization of Functions Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). CITED REFERENCES AND FURTHER READING: Brent, R.P. 1973, Algorithms for Minimization without Derivatives (Englewood Cliffs, NJ: Prentice￾Hall), Chapter 7. [1] Acton, F.S. 1970, Numerical Methods That Work; 1990, corrected edition (Washington: Mathe￾matical Association of America), pp. 464–467. [2] Jacobs, D.A.H. (ed.) 1977, The State of the Art in Numerical Analysis (London: Academic Press), pp. 259–262. 10.6 Conjugate Gradient Methods in Multidimensions We consider now the case where you are able to calculate, at a given N￾dimensional point P, not just the value of a function f(P) but also the gradient (vector of first partial derivatives) ∇f(P). A rough counting argument will show how advantageous it is to use the gradient information: Suppose that the function f is roughly approximated as a quadratic form, as above in equation (10.5.1), f(x) ≈ c − b · x + 1 2 x · A · x (10.6.1) Then the number of unknown parameters in f is equal to the number of free parameters in A and b, which is 1 2N(N + 1), which we see to be of order N 2. Changing any one of these parameters can move the location of the minimum. Therefore, we should not expect to be able to find the minimum until we have collected an equivalent information content, of order N 2 numbers. In the direction set methods of §10.5, we collected the necessary information by making on the order of N 2 separate line minimizations, each requiring “a few” (but sometimes a big few!) function evaluations. Now, each evaluation of the gradient will bring us N new components of information. If we use them wisely, we should need to make only of order N separate line minimizations. That is in fact the case for the algorithms in this section and the next. A factor of N improvement in computational speed is not necessarily implied. As a rough estimate, we might imagine that the calculation of each component of the gradient takes about as long as evaluating the function itself. In that case there will be of order N 2 equivalent function evaluations both with and without gradient information. Even if the advantage is not of order N, however, it is nevertheless quite substantial: (i) Each calculated component of the gradient will typically save not just one function evaluation, but a number of them, equivalent to, say, a whole line minimization. (ii) There is often a high degree of redundancy in the formulas for the various components of a function’s gradient; when this is so, especially when there is also redundancy with the calculation of the function, then the calculation of the gradient may cost significantly less than N function evaluations
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有