Journal of Research of the National Bureau of Standarde Vol.49.No.6.December 1952 Research Paper 2379 Methods of Conjugate Gradients for Solving Linear Systems Magnus R.Hestenes 2 and Eduard Stiefel 3 An iterative algorithm is given for solving a system r=k of n linear equations in n The solution is given in n steps. It is shown that this method is a special case of a very general method which also includes Gaussian elimination These general algorithms are essentially algorithms for finding an n dimensional ellipsoid. onnect1oǖs are made with the theory of orthogonal polynomials and continued fractions. 1.Introduction method;(b)the conjugate gradient method presented in the present monograph One of the major problems in machine computa- Tbere are many variations of the elimination tions is to find an effective method of solving a method,just as there are many variations of the svstem of n simultaneous equations in n unknowns, conjugate gradient method here presented. In the particularly if n is large. There is,of course,no present paper it will be shown that both methods best method for all problems because the goodness are special cases of a method that we call the method of a method depends to some extent upon the of conjugate directions.This enables one to com- particular system to be solved.In judging the pare the two methods from a theoretical point of goodness of a method for machine computations,one vlew. should bear in mind that criteria for a good machine In our opinion,the conjugate gradient method is method may be different from those for a hand superior to the elimination method as a machine method.By a hand method,we shall mean one method.Our reasons can be stated as follows: in which a desk calculator may be used.By a (a)Like the Gauss elimination method,the method machine method,we shall mean one in which of conjugate gradients gives the solution in n steps if sequence-controlled machines are used. no rounding-off error occurs. A machine method should have the following (b)The conjugate gradient method is simpler to properties: code and requires less storage space (1)The method should be simple,composed of a (c)The given matrix is unaltered during the proc- repetition of elementary routines requiring a mini- ess,so that a maximum of the original data is used mum of storage space. (2)The method should insure rapid convergence The advantage of having many zeros in the matrix is preserved. The method is.therefore,especially if the number of steps required for the solution is suited to handle linear systems arising from differeuce infinite.A method which-if no rounding-off errors equations approximating boundary value problems. occur-will vield the solution in a finite number of steps is to be preferred. (d)At each step an estimate of the solution is (3)The procedure should be stable with respect given,which is an improvement over the one given in to rounding-off errors.If needed,a subroutine the preceding step. should be available to insure this stability.It (e)At any step one can start anew by a very should be possible to diminish rounding-off errors simple device,keeping the estimate last obtained as by a repetition of the same routine,starting with the initial estimate. the previous result as the new estimate of the In the present paper,the conjugate gradient rou- solution. tines are developed for the symmetric and non- (4)Each step should give information about the symmetrie cases.The principal results are described solution and should yield a new and better estimate in section 3.For most of the theoretical considera- than the previous one. tions,we restrict ourselves to the positive definite (5)As many of the original data as possible should symmetric case.No generality is lost therebv.We be used during each step of the routine.Special deal only with real matrices.The extension to properties of the given linear system-such as having complex matrices is simple. many vanishing coefficients-should be preserved. The method of conjugate gradients was developed (For example.in the Gauss elimination special independently by E.Stiefel of the Institute of Applied properties of this type may be destroyed.) Mathematics at Zurich and by M.R.Hestenes with In our opinion there are two methods that best fit the cooperation of J.B.Rosser.G.Forsythe.and these criteria,namely,(a)the Gauss elimination L.Paige of the Institute for Xumerical Analy sis orfornud ont a xatioul Hurean of statrlards cuIract wilh Xational Bureau of Standards.The present account rsity of Califoruia at Los Angeles.and was sponsored (in part)by the was prepared jointly by A.R.Hestenes and E. National Bureau of Stiefel during the latter's stay at the National Bureau of Standards. The first papers on this inethod were 409
given by E.Stiefel and by X.R.Hestenes.5 Reports nll了,1n.11s1il10i4t4tir,If.1issy1i1- on this method were given by E.Stiefel and J.B. metrie,then two veetors and y are said to be con Rosser at a Symposium s on August 23-25,1951. jugate or .-orthogonal if the relation (r..Iy Recently,C.Lanczos developed a closely related (.=0 holds.This is an extension of the ortho- routine based on his carlier paper on eigenvalue gonality relation (.y)=0. problem.Examples and numerical tests of the By an eigenralue of a matrix .I is meant a number method have been by R.Hayes,U'.Hochstrasser, 入such that dy=入has a solution y≠,and y is and AI.Stein called a corresponding eigenrector. Inless otherwise expressly stated the matrix .1. 2.Notations and Terminology with which we are coneerned,will be assumed fo be symmetric and positire definite.Clearly no loss of Throughout the following pages we shall be con- generality is caused thereby from a theoretical point cerned with the problem of solving a system of linear of view,because the system dr=k is equivalent to equations the system Br=l,where B=4*.4,l=.1*k.From a 011十01:十·,,十01mtn= numerical point of view,the two systems are differ- ent,because of rounding-off errors that occur in a1X:十02dg十··.十Q2mt=k2 joining the product 4*.Our applications to the (2:1) nonsymmetric case do not involve the computation 。4 of1*.1 In the sequel we shall not have oceasion to refer to aud1十uw2十·,·+0na这n=ka a particular coordinate of a vector.Accordinglv we may use subseripts to distinguish vectors instead These equations will be written in the vector form of components.Thus zo will denote the vector 1x=k. Here d is the matrix of coefficients (a:), Io01,· .ro)and r;the vector (u,....).In case r and k are the vectors (n,...,n)and (t,...). a symbol is to be interpreted as a component,we shall It is assumed that d is nonsingular.Its ingerse A-1 call attention to this fact unless the interpretation is therefore exists.We denote the transpose of by evident from the context. 1* The solution of the system Ar=k will be denoted by Given two vectors r=(n,...,r)and y= h;that is,Ah=k.If x is an estimate of h,the differ- (yi,..),their sum +y is the vector ence r=k-r will be called the residual of x as an (+yi,.:.,In+yn),and az is the vector (ar,...,azn), estimate of h.The quantityr will be called the where a is a scalar.The sum squared residual.The vector h-z will be called the error rector of r,as an estimate of h. (x,)=11十12y十.·.十xmym 3. Method of Conjugate Gradients((cg- is their scalar product.The length of r will be denoted Method) b x=(+...+=(x,x) The present section will be devoted to a description The Cauchy-Schwarz inequality states that for all of a method of solving a system of linear equations Az=k.This method will be called the conjugate 工, (c,)2≤(x,x)(y,y)or(c,y)i≤xy.(2:2) gradient method or,more briefly,the cg-method,for reasons which will unfold from the theory developed The matrix A and its transpose d*satisfy the in later sections.For the moment,we shall limit ourselves to collecting in one place the basic formulas relation upon which the method is based and to deseribing (任,A0=之ax=(A*z,20. briefly how these formulas are used. =1 The cg-method is an iterative method which terminates in at most n steps if no rounding-off If a=a,that is,if A=A*,then d is said to be errors are encountered.Starting with an initia symmetric.A matrix A is said to be positive definite estimate ro of the solution h,one determines succes- in case(x,Ax)>0 whenever z≠0.If(x,A)≥0for sively new estimates ro,rr2,...of h,the estimate x being closer to h than z+.At each step the residual r:=k-Ar is computed. Normally this E.Stiefel,Tehereinige Methoden der Relaxationsrechnung,Z.angew.Xath, vector can be used as a measure of the "goodness" hods of relara of the estimate z.However,this measure is not a reliable one because,as will be seen in section 18 Proceedings of the symposium (see footnote s). J.B.Rosser,Rapidly converging iterative methods for solving linear equa. it is possible to construet cases in which the squared tions.to appear in the heTPo residual ri increases at each step (except for the last)while the length of the error vector h-z the campus of the University of California at Los Angeles (August 23-25.1951). Lancsos.Solution of systems of linear equations by minimised iterations, decreases monotonically.If no rounding-off error N L Report 52-13, is encountered,one will reach an estimate rm(msn) ymposium'on Machinery,Cambridge,\iuss 191,pages 164-200. -Seale Digitai Calculating at which rm=0.This estimate is the desired solu- tion h.Normally,m=n.However,since rounding- 410
off errors always occur except under very unusual Ar=k (3:4) circumstances.the estimate r in general will not be the solution h but will be a good approximation of h can be obtained by the formula If the residual r is too large,one may continue with the iteration to obtain better estimates of h (pik) Our experience indicates that frequently+is 1- (3:5) considerably better than r.One should not con- 白(An,pD. tinue too far bevond r but should start anew with the last estimate obtained as the initial It follows that,if we denote by pi the jth component estimate,so as to diminish the effects of rounding- of pi,then off errors.As a matter of fact one can start anew 罗子卫D at any step one chooses.This fexibility is one of the =0(p,Ap:) principal advantages of the method. In case the matrix A is symmetric and positire is the element in the jth row and kth column of the definite.the following formulas are used in the con- inverse of A. Jugate gradient method: There are two objections to the use of formula (3:5).First,contrary to the procedure of the Po=ro=k-Hro (o arbitrarv) (3:1a) general routine (3:1),this would require the storage of the veetors po,p,....This is impractical. irde apAp (3:1b) particularly in large systems.Second,the results obtained by this method are much more influenced by rounding-off crrors than those obtained by the :+1=I:十ap (3:1c step-by-step routine (3:1). In the cg-method the error rector h-z is diminished "+1="一01p (3:1d) in length at each step.The quantity f(r)=(h-r A (h-)),called the error function,is also diminished b,=42 at each step.But the squared residualr2=k-Ar 2 .r2 (3:1e normally oscillates and may even increase.There is a modification of the cg-method where all threc p1+1=T1十bp (3:1fD quantities diminish at each step.This modification is given in section 7.It has an advantage and a In place of the formulas (3:1b)and (3le)one may disadvantage.Its disadvantage is that the error use vector in each step is longer than in the original method. Xoreover.the computation is complicated (p:,r) -pp (3:2a since it is a routine superimposed upon the original one. However,in the special case where the given linear equation system arises from a difference 6,=- (r+,1p approximation of a boundarv-value problem,it can (3:2b) (p.1p》 be shown that the estimates are smoother in the modified method than in the original.This mav be Although these formulas are slightly more compli- an advantage if the desired solutior is to be differ- cated than those given in (3:1),they have the ad- entiated afterwards. vantage that scale factors (introduced to increase Concurrently with the solution of a given linear accuracy)are more casily changed during the course svstem,characteristic roots of its matrix mnav be of the computation. obtained:compute the values of the polynomials The conjugate gradient method (cg-method)is Ro,R1....and Po.Pi,...at A by the iteration given by the following steps: Initial step:Select an estimate ro of h and com- R=1P0=1 pute the residual ro and the direction po by formulas (3:1a). R+1=R-λa,P General routinc:Having determined the estimate r;of h,the residual r.,and the direction pi compute P:+1=R:+1+6.Pr (8:6) 1,r1,and pi-by formulas (3:1b),...(3:1f) successively. As will be seen in section 5,the residuals ro,r. The last polvnomial R)is a factor of the charac- ..are mutually orthogonal,and the direction vee- teristic polynomial of.I ard coincides with it when )=7. tors pu,h.···re mutually conjugate,that is, The characteristic roots,which are the zeros of R().can be found by Newton's methods without r,r》=0,(P,.1p)=0(i≠.(3:3) actually computing the polynomial R()itself. One uses the formulas These relations can be used as checks. Once one las obtained the set of n mutuallv Rm(入) (3:7) conjugate vectors p...,p-i the solution of 入1=入k一 Rn入) 411
where R).B)are determined by the iteration xt seleet a direction p,conjugate top.·…P (3:5)11. that is.such that =1P6=0 (P-t1p)=0 j=0.1、... (4:2 R:+1=R-λa:-WP In a sense the cd-method is not precise.in that no P'-1=R,-bP formulas are given for the computation of the diree- tions2o,p.··.·Various formulas can be giveu. with =A.In this connection.it is of interest to each leading to a special method.The formula observe that if m=n,the determinant of is given (3:1f)leads to the cg-method.It will be seen in by the formula section 12 that the case in which the p's are obtained by an .1-orthogonalization of the basic vectors 1 (1,0,..,0),(0,1,0,...),.·.leads essentially to let(.1)= a0:.,·aR-1 the Gauss elimination method. The basic properties of the cd-method are given by The cg-method can be extended to the case in the following theorems. which A is a general nonsymmetric and nonsingular Theorem 4:1.The direction tectors po,P,. matrix.In this case one replaces eq (3:1)by the set mutually conjugate.The residual rector r:is orthogonal to po,pi,..pi.The inner produet of pwith each ro=k-ro. 卫n=1*ro, of the residuals ro.r,..,r is the same.That is, 1*r2 (p,1p)=0 (≠》 (4:3a1 d=Ap:? ,r)=0 (j=0,1,··,i-1) (4:3b) +1=:十0p 3:8】 (2,rn)=(p,r1)=···=(pr). (4:3e) r+1=r,-0p The sealar a:can be giren by the formula 6=1r+ Ar a:- (pi.rol (4:4) (p,1p,】 卫+1=1*r+1+bP in place of (4:1a). Equation (4:3a)follows from (4:2).Using (4:1e), This svstem is discussed in section 10. we find that 4.Method of Conjugate Directions (cd- (pn广k+i)=(P,rx)-a(pdpx). Method)' If j=k we have.by (4:1a).(perx+1)=0.Moreover. The cg-method can be considered as a special case by (4:3a)(p,r+1)=(ps.r:),(jk).Equations (4:3b) of a general method,which we shall call the method and (4:3c)follow from these relations.The formula of conjugate directions or more briefly the cd-method. (4:4)follows from (4:3c)and (4:1a). In this method,the vectors po,p,...are selected As a consequence of (4:4)the estimates ri.re,... of h can be computed without computing the resid- to be mutually conjugate but have no further restric- tions.It consists of the following routine: uals ro,r,··,provided that the choice of the Initial step.Select an estimate ro of h (the solu- direction vectors po,p,···is independent of these tion),compute the residual ro=k-ro:and choose a residuals. Theorem 4:2.The cd-method is an m-step method direction Po. General routine.Having obtained the estimate (m sn)in the sense that at the mth step the estimate rm of h,the residual r:=k-Ar and the direction p is the desired solution h. compute the new estimate ri and its residual r. For let m be the first integer such that yo=k-rois by the formulas in the subspace spanned by po...p.Clearly. m sn,since the vectors Po,pi, ··are lnearly1n(de- pendent.We may,accordingly,choose scalars (pird 0:= (pip) (4:1a) ca,··.am-1 such that /n=C0P0于···-mm-'w-1 1i+l=了:T0P (4:1b) Hence, r+1=一U+卫 (4:1c) h=了0÷anP+···+am-iPm- \oreover, simultaneous equat ions m3 tics e.2.14-173(194s, r%=k一.1r0=1(h-Ia》=ar1n+···十am-11Pm- 412
Computing the inner product (piro)we find by (4:3a) Geometrically,the equation f(r)=const.defines and (4:4)that a:=a:,and hence that h=rm,as was an ellipsoid of dimension n-1.The point at which to be proved. f()has a minimum is the center of the ellipsoid and The cd-method can be looked upon as a relaxation is the solution of Ar=k.The i-dimensional plane method.In order to establish this result,we intro- P:,described in the last theorem.cuts the ellipsoid duce the function f(2)=f(ro)in an ellipsoid E of dimension i-1. unless Ei is the point ro itself.(In the cg-method, f(x)=(h-r,A(h-r)=(,Ar)-2(,k)+(hk).(4:5) E:is never degenerate,unless ro=h.)The point t is the center of E:.Hence we have the corollary: Clearly,f(r)20 and f(r)=0 if,and only if,r=h. Corollary 1.The point x is the center of the The function f(r)can be used as a measure of the (i-1)-dimensional ellipsoid in ahich the i-dimensional "goodness"of x as an estimate of h.Since it plays plane P:intersects the (n-1)-dimensional ellipsoid an important role in our considerations,it will be f(x)=f(). referred to as the error function.If p is a direction Although the function f()is the fundamental vector,we have the useful relation error function which decreases at each step of the relaxation.one is unable to compute f(z)without f(r+ap)=f(r)-2a(p,r)+a2(p.Ap), (4:6) knowing the solution h we are seeking.In order to obtain an estimate of the magnitude of f(x)we may where r=k-AI=A(h-z),as one readily verifics use the following: by substitution.Considered as a function of a. Theorem 4:4. The error rector y=h-x,the residual the function f(r-ap)has a minimum value at r=k-AI,and the error function f(r)satisfy the a=0,where relations (P,r) (4:7) a何s)s ri a=(p:Ap) 4:11) (y) This minimum value differs from f(r)by the quantitv where u(z)is the Rayleigh quotient fx)-fr+a川=a(p,Ap)=· (4:8) u(e)=3,d 2 4:12) Comparing (4:7)with (4:1a),we obtain the first two The Rayleigh quotrent of the error rector y does not sentences of the following result: eaceed that of the residual r.that is, Theorem 4:3.The point ri minimizes f(r)on the line r=r1+api-1.At the i-th step the error f() u()≤u(r). 1:13) is rc'ared by the amount Aorcorer, ir (( (4:14) 小- (4:9) The proof of this result is based on the Schwarzian In fact.the point r:minimizes f(r)on the i-dimensional quotients plane P:of points (a.1)(1.1)(12,A22) (,a)≤(a,4)≤(A2,A2) (4:15 x=r0a0Pn十.+a-P-1, (4:10) The first of these follows from the inequality of Schwarz where ao,..ai are parameters. This plane con- tains the points ro......i (p.q)2≤(p,p)(q,q) (4:16) In view of this result the cd-method is a method by choosing p=2,g=42.The second is obtained of relaxation of the error furction f).An iteration by selecting p=Ba,g=B3z,where B2=A. of the routine may accordingly be referred to as In order to prove theorem 4:4 recall that if we a relaxation. set y=h-r,then In order to prove the third sentenee of the theorem observe that at the point (4:10) r=k-1=1(h-了)=y f(r)=f(ro)- 会2apo-p4nl fr)=(y.1y) by (4:5).I'sing the inequalities (4:15)with =4. At.the minimum point we have we see that (p,.ro) a-(piAp 40w0=3.1ws1.10》=2s40 .10=≤(.4.10 and hence a,=a).by (4:4).The minimum point is =, ccordingly the point r.as was to be proved. (,ur. 413
This yields (4:11)and (4:13).I'sing (4:16)with In view of theorem 4:6.it is seen that at each step p=y and g=r we find that of the cd-routine the dimensionality of the spacer in which we seek the solution h is reduced by unity. fx)=(y1y)=y,r)≤ylrl. Beginning with ro.we seleet an arbitrary chord Co of Hence f(r)=f(ro)through ro and find its center.The plane fr)=4)y2≤yr, through r conjugate to (a contains the centers of all chords parallel to C.In the next step we re- so that the second inquality in (4:14)holds. The strict ourselves to r and select an arbitrary chord first inequality is obtained from the relations C:of f(r)=f(n)through r and find its midpoint r:and the plane r:in conjugate to C:(and hence 2 u(r)) to Co).This process when repeated will yield the answer in at most n steps.In the cg-method the chord Cr of f(r)=f()is chosen to be the normal As is to be expected,any ed-method has within at Je its routine a determination of the inverse 1-1 of 4. We have,in fact,the following: 5.Basic Relations in the cg-Method Theorem 4:5.Let po, .,Pn-be n mutually con- jugate nonzero vectors and let p be the i-th component Recall that in the cg-method the following formulas The element in the j-th rou and k-th column of are used 海ieen b the sum po=ro=k-ro (5:1n pupis re (5:1h (p:Ap) a(puAp This result follows from the formula r1-1=1+0P: (5:1 h=罗p:利 11=一04月g (5:1) (pip)P 6,= (5:1e) for the solution h of r=k,obtained by selecting irg? 10=0. We conclude this section with the following: Pitt=ri1-bipt. (5:1f) Theorem 4:6.Let w:be the (n-i)-limensional plane through r;conjugate to the rectors po.P...., One should verify that eg.(5:1e)and (5:1f)hold for p:-..The plane contains the points i=0.1.2....if.and only if. and intersects the (n-1)-dimensional ellipsoid f(r)= f()in an ellipsoid E:of dimension (n-i-1). The center of E:is the solution h of Ar=k.The point h=空 k=0,1,2.,… (5:2) x是the midpoint of the chord C:ofthrough为 which is parallel to pi.In the cg-method the chord C is normal to E at ri and hence is in the direction of The present section is devoted to consequences of the gradient of f(r)at x:in w. these formulas.As a first result,we have The last statement will be established at the end Theorem 5:1.The residuals ro.ri. and the of section 6.The equations of the plane is given direction rectors po.p....generated by (5:1)satisfy by the system the relations (Ap.r-r)=0 (i=0.1...i1). (,r)=0 (位≠) (5:3a Since pp..are conjugate to po...pi-,so also (p,1p)=0 (≠) (5:3b i Ct-I=0p:T。·.-Qk-1Pk-1 (k>i). (p,r=0(i<), (p,,)=r2(i≥j)(5:3c The points r,d+i… ..rm=h are accordingly in i and h is the center of E;.The chord C.is defined by (r1p)=(p,1pd,(.1)=0(i≠j,i≠j÷1)(5:3d the equation z=r:-tapi,where t is a parameter.As is easily seen. The residuals ro....are mutually orthogonal and the direction rectcrs po,p,···are mutually conju- fx十taP:=fx)一(2t-t护)a(p1p) gate The proof of this result will be made by induction. The second endpoint of the chord C:is the point The vectors ro.Po =ro and ri satisfy these relations x+2a.p at which t=2.The midpoint corresponds since to t=1,and hence is the pointr as was to be proved. (ro.r)=(po.r1)=ro-ao(ro:Apo)=0 414
by(5:1b).Suppose that(5:3)holds for the vectors we accordingly suppose that the routine terminates .P-1.To show that pr can eadjoiadsstifsnecCoaoToat: at the n-th step.Since the r:is orthogonal to ritt we haver:and hence a:40.It follows that (p:r)0by (4:1a).We may accordingly suppose the (rupt)=r (≤) (5:4a) vectors p:have been normalized so that (par)=r:2. In view of (4:3b)and (4:3c)eq (5:3c)holds.Select (pAps)=0 (i0,eq (5:4b)holds.In order to establish lished. Theorem 5:3.The residual rectors ro.n. and (5:4c),we use (5:1f)to obtain the direction rectors po.p,...satisfy the further relations (p:.Ap)=(rx.Api)+bx-1(p:-1.Ap=(rAp) (≠k-1) p,p=2 r (i≤) (5:6a) It.follows that (5:4c)holds and hence that (5:3) holds for the vectors ro.n·..rg and po,p1,··y p2=r2+b1p4-12=r∑ 1 ir (>0) Pt (5:6b) It remains to show that re can be adjoined to this set.This will be done by showing that (r1)=0 i->1 (5:6c) (r,rk+i)=0 (i≤k) (5:5a) (r1Ar)=(pAp)+b-(p-1,1p-) (1>0). (4AP,rk-)=0 (i0.(5:7l) Theorem 5:2.The cg-method is a ed-mcthod.It is 141=(1+b,)p:-a1,-b1-P-i the special case of the cd-method in which the p:are obtained by A-orthogonalization of the residual vecturs Similarly,tlt.residuals ro,ri,···satisfy the relations ri.On the other hand,a cd-methond in which the rexid- ri=To-avAro (5:8a) uals ro.ri....arc mutually orthogonal is essentially a cg-mcthod. r+1=(1+b-i)r,-a1r-b-1r-1 (5:8h) The term "essentially"is used to designate that we disregard iterations that terminate in fewer than arht re n steps,unless one adds the natural assumption that the formula for p,in the routine depends continu- ously on the initial estimate ro.To prove this result bb (5:9) 415
Equation (5:7b)is obtained by climinating r Theorem6:L.The extimates ro,I,···,Imfh1re and r:from the equations distinet.The point 1:minimizes the error funetion f(r)=(h-r,1(h-r))on the i-dimensional plane P P1+1=r:+1+b: passing through the points ro,,.·,·, In the ith step of the cg-method.f(r)is diminished by the amount r=r-ap: fr-)-fx=n-1r-2=u(p-)r-1-r2(6:1 4=r:+b:--1 where u(z)is the Rayleigh quotient (4:12).Hence. Equation (5:7a)follows similarly.In order to prove (5:8b),eliminate .Ip,and .ip:-from the equations fr-fr)=r2+···+n-ir-(i0 it follows that r(p.Apo (,-r,11->0(ii. Let now ro,n,...,rm=k be the estimates of h Theorem 6:3.At each step of the cg-algorithm the obtained by applying the cg-method.Let ro.ri, error roctor yi=h-ri is reduced in length.In fact. ...rm=0 be the corresponding residuals and po p,...,pm-1 the direction vectors used.The pres- -:-y2=f+f- (6:5) ent section will be devoted to the study of the prop- 4P:-i erties of the points ro.i....,rm.As a first result we have where u(z)is the Rayleigh quotient (4:12). 416
In order to establish (6:5)observe that,by (5:6a),is the point in P;whose distance from the solution h is the least.It lies on the line ri-r;beyond r.Mforeorer, (y1.r-t-=(tm-x1,p-1)a1-1 1 =[a(p,p:-)十.,.+am-(pm-pt-i]a-i J(25)-f(r)f) (6.9) =a+.+am-17m-3a4-242 and 「-2 h-x2=h-r2+)-f .(6.10) 4(P-1 In view of (6:2)and (5:1b)this becomes In order to establish (6:9)and 6:10)we use the ,54--)=) (6:6) formula 4(p-1) f(r-api-1)=f(ril-a(p:-1.Ap-1). Setting r=r-and j=m in (6:4),we obtain (6:5) by the use of (6:6)and (6:1). which holds for all values of a in view of the fact that This result establishes the cg-method as a method r:minimizes f(x)on P.Setting a=f(r)-1 of successive approximations and justifies the pro- we have cedure of stopping the algorithm before the final- step is reached.If this is done,the estimate ob- tained can be improved by using the results given )=f+a=f0r+, f(r f()2 in the next two theorems. Theorem,6:4..Ltr4.··,r be the projec- An algebraic reduction yields (6:9).Inasmuch as tions o时the points r+,···,tm=h in the i- dimensional plane P:passing through the points ro. ··· .The points r-,了x·,,81ie h-2-h-02=,)12-1 r- on a straight line in the order giren by their enumeration. The point rg (k>i)is giren by the formulas f(r)2a1-1卫-12 =1-4+-=u,-1-,67a =f5--f04-3 f(1)-f(ri) we obtain (6:10)from (6:9)and (5:1b). =-1+fU-f As a further result we have 1-2 P1-1 (G:7b) Theorem6:6.Iett,··: be the projections of the points n..,on the line joining the initial point ro to the solution m=h.The points ro..... In order to prove this result,it is sufficient to establish (6:7).To this end observe first that the atroete vector tion withoui oscillation.To prove this fact we need 1-1 (1≥) only observe that (rm-了r1-了1-)=(m-了00-P-) is orthogonal to each of the vectors po,p,. p.This can be seen by forming the inner product 二1 with p(0 10 -,9=0 (P)-M-= by(5:6a). A similar result holds for the line joining rt0,(i+). in the plane x:is a scalar multiple of the rector p. Since f)=0,we have the first part of The gradient of f(r)at了:is the vector-rTle gradient g:of f(r)at r:in r,is the orthogonal projec- Thcorom 6:5.The point tion of -r in the plane i.Hence g:is of the form =+ 51-121 (6:8) 417
where ao,··,a-t are chosen s0thtq:is orthogonal = (7:3l1 t0.lp,,·,lp-1:Sie C:0r: p=r o72 -,=公会 1。 (7:31 r+1=r-0·1P, g=0.1.…..i-1)、 The sum of the cotfficients of ro,r.....sin (7:3b) (and hence of ror....,r;in (7:3c))is unity it is seen upon elimination of ro... .…"i-1s1l'res- sively that p is also a linear combination of r:.po. The relation (7:1)can be established by induction It holds for i=0.If it holds for i,then ...Ap1.Inasmuch as p:is conjugate to pa. p-1,it is orthogonal to ipo,. Ap:-1.The vector p accordingly is a scalar multiple of the gradient q P+1=r-1十bP=(1+b,C)k-1(r+1+bcE,) of f(r)at r:in r,as was to be proved. =(+(k一1F-) In view of the result obtained in theorem 6:7 it is seen that the name "method of conjugate gradients" The formula (7:3a)follows from (7:2a),(5:le)and is an appropriate name for the method given in sec- 5:6b). Formula (7:3b)is an easy consequence of tion 3.In the first step the relaxation is made in the (7:2b. To prove (7:3c)one can use (5:2)or (7:3b). direction po of the gradient of f(r)at ro,obtaining as one wishes.The final statement is a consequence a minimum value of f()at r.Since the solution k of(7:3a). lies in m,it is sufficient to restrict r to the plane ri Theorem 7:2.The point r;gicen by (7:2)lies in Accordingly,in the next step,we relax in the direc- the conrer closure Si of the points ro.r, tion p of the gradient of f(r)in at ri,obtaining the the point r in the i-dimensional plane P:through ro. point r2 at which f(r)is least.The problem is then n,.,r:at which the squared residual k-Ar2 has reduced to relaxing f(r)in the plane 2,conjugate to its minimum ralue. This minimum ralue is giren by po and pi.At the next step the gradient in r2 in ra the formula is used.and so on.The dimensionality of the space in which the relaxation is to take place is reduced bv unity at each step.Accordingly,after at most n ✉2_ steps,the desired solution is attained. ctp (7:4) The squared residuals ro,T,.···diminish mone- 7.Properties of the Estimates x;of h in the tonically during the cg-method.At the ith step the cg-Method squared residual is reduced by the amount In the cg-method there is a second set of estimates F-12-r2= .Tol, (7:5) To=ro,.of h that can be computed,and Ci that are of significance in application to linear systems arising from difference equations approxi- The first statement follows from (7:3b),since the mating boundary-valve problems.In these applica- coefficients of Jo,r,. ··,r are positive and have tions,the function defined by is smoother than unity as their sum.In order to show that the that of r,and from this point of view is a better squared residual has a minimum on P:at F:,observe approximation of the solution h.The point 7:has that a point r in P:differs from 7,by a vector z:of its residual proportional to the conjugate gradient p.. the form The points To,Ti,2: can be computed by the iteration(7:2)given in the following: Theorem 7:1.The conjugate gradientp:iserpressible r-E:=1=D0十···十a-P-1 in the form The residual r=k-Ar is accordingly given by P=C(k-A7:, (7:1) r=r:-12 where c:and Ts are defined by the recursion formulas 1z=an1p0十··-a-11i-1 co=1,c1-1=1÷b:: (7:2a) Inasmuch as.by (7:3c),F=Pc:,we have 0=r0.7441=41+b:c7 (7:2b) C:+1 G,1p,}=1(p.dp)=0 (j79 (山≠i) 418