19.5 Relaxation Methods for Boundary Value Problems 863 FACR Method The best way to solve equations of the form(19.4.28),including the constant coefficient problem(19.0.3),is a combination of Fourier analysis and cyclic reduction, the FACR method [3-6].If at the rth stage of CR we Fourier analyze the equations of the form (19.4.32)along y,that is,with respect to the suppressed vector index,we will have a tridiagonal system in the z-direction for each y-Fourier mode: 砖-2r+鸿+鸿+2=△2gk (19.4.35) 81 Here is the eigenvalue ofT()corresponding to the kth Fourier mode.For the equation (19.0.3),equation (19.4.5)shows that will involve terms like cos(2k/L)-2 raised to a power.Solve the tridiagonal systems for at the levels j=2,2 x 2r,4 x 2,...,J-2r.Fourier synthesize to get the y-values on these z-lines.Then fill in the intermediate z-lines as in the original CR algorithm. The trick is to choose the number of levels of CR so as to minimize the total number of arithmetic operations.One can show that for a typical case of a 128x128 mesh,the optimal level is r =2;asymptotically,r-log2(log2J). A rough estimate of running times for these algorithms for equation(19.0.3) is as follows:The FFT method(in both x and y)and the CR method are roughly comparable.FACR with r=0(that is,FFT in one dimension and solve the tridiagonal equations by the usual algorithm in the other dimension)gives about a factor of two gain in speed.The optimal FACR with r=2 gives another factor ss是0 of two gain in speed. CITED REFERENCES AND FURTHER READING: 6 Swartzrauber,P.N.1977,S/AM Review,vol.19,pp.490-501.[1] Buzbee,B.L,Golub,G.H.,and Nielson,C.W.1970,SIAM Journal on Numerical Analysis,vol.7, pp.627-656:see als0op.ct.vol.11,pp.753-763.2] Hockney,R.W.1965,Journal of the Association for Computing Machinery,vol.12,pp.95-113.[3] Hockney,R.W.1970,in Methods of Computational Physics,vol.9(New York:Academic Press). pp.135-211.[4] Hockney,R.W.,and Eastwood,J.W.1981.Computer Simulation Using Particles (New York: Recipes McGraw-Hill),Chapter 6.[5] Numerical Recipes 10621 43106 Temperton,C.1980,Journal of Computationa/Physics,vol.34,pp.314-329.[6] (outside North 19.5 Relaxation Methods for Boundary Value Problems As we mentioned in $19.0,relaxation methods involve splitting the sparse matrix that arises from finite differencing and then iterating until a solution is found There is another way of thinking about relaxation methods that is somewhat more physical.Suppose we wish to solve the elliptic equation Cu=p (19.5.1)
19.5 Relaxation Methods for Boundary Value Problems 863 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). FACR Method The best way to solve equations of the form (19.4.28), including the constant coefficient problem (19.0.3),is a combination of Fourier analysis and cyclic reduction, the FACR method [3-6]. If at the rth stage of CR we Fourier analyze the equations of the form (19.4.32) along y, that is, with respect to the suppressed vector index, we will have a tridiagonal system in the x-direction for each y-Fourier mode: uk j−2r + λ(r) k uk j + uk j+2r = ∆2g(r)k j (19.4.35) Here λ(r) k is the eigenvalue of T(r) corresponding to the kth Fourier mode. For the equation (19.0.3), equation (19.4.5) shows that λ(r) k will involve terms like cos(2πk/L) − 2 raised to a power. Solve the tridiagonal systems for uk j at the levels j = 2r, 2 × 2r, 4 × 2r, ..., J − 2r. Fourier synthesize to get the y-values on these x-lines. Then fill in the intermediate x-lines as in the original CR algorithm. The trick is to choose the number of levels of CR so as to minimize the total number of arithmetic operations. One can show that for a typical case of a 128×128 mesh, the optimal level is r = 2; asymptotically, r → log2(log2 J). A rough estimate of running times for these algorithms for equation (19.0.3) is as follows: The FFT method (in both x and y) and the CR method are roughly comparable. FACR with r = 0 (that is, FFT in one dimension and solve the tridiagonal equations by the usual algorithm in the other dimension) gives about a factor of two gain in speed. The optimal FACR with r = 2 gives another factor of two gain in speed. CITED REFERENCES AND FURTHER READING: Swartzrauber, P.N. 1977, SIAM Review, vol. 19, pp. 490–501. [1] Buzbee, B.L, Golub, G.H., and Nielson, C.W. 1970, SIAM Journal on Numerical Analysis, vol. 7, pp. 627–656; see also op. cit. vol. 11, pp. 753–763. [2] Hockney, R.W. 1965, Journal of the Association for Computing Machinery, vol. 12, pp. 95–113. [3] Hockney, R.W. 1970, in Methods of Computational Physics, vol. 9 (New York: Academic Press), pp. 135–211. [4] Hockney, R.W., and Eastwood, J.W. 1981, Computer Simulation Using Particles (New York: McGraw-Hill), Chapter 6. [5] Temperton, C. 1980, Journal of Computational Physics, vol. 34, pp. 314–329. [6] 19.5 Relaxation Methods for Boundary Value Problems As we mentioned in §19.0, relaxation methods involve splitting the sparse matrix that arises from finite differencing and then iterating until a solution is found. There is another way of thinking about relaxation methods that is somewhat more physical. Suppose we wish to solve the elliptic equation Lu = ρ (19.5.1)
864 Chapter 19.Partial Differential Equations where C represents some elliptic operator and p is the source term.Rewrite the equation as a diffusion equation, du (19.5.2) Ot =Cu-p An initial distribution u relaxes to an equilibrium solution as t-oo. This equilibrium has all time derivatives vanishing.Therefore it is the solution of the original elliptic problem(19.5.1).We see that all the machinery of19.2,on diffusive initial value equations,can be brought to bear on the solution of boundary value problems by relaxation methods. Let us apply this idea to our model problem(19.0.3).The diffusion equation is 菲 0u02u.0u t=m2+ 6g2-p (19.5.3) 7 If we use FTCS differencing (cf.equation 19.2.4),we get △t =+(+1u+-1u+u+1+-1-4u别)-pAt(19.54) Recall from(19.2.6)that FTCS differencing is stable in one spatial dimension only if △t/△2≤.In two dimensions this becomes△t/△2≤子.Suppose we try to take —于王久 9 the largest possible timestep,and set At =A2/4.Then equation(19.5.4)becomes 42 石包∽ 9 4=(u写+u+u-11+u+1+-1)- 4 4P.1 (19.5.5) Thus the algorithm consists of using the average of u at its four nearest-neighbor points on the grid (plus the contribution from the source).This procedure is then 6 iterated until convergence. This method is in fact a classical method with origins dating back to the last century,called Jacobi's method (not to be confused with the Jacobi method for eigenvalues).The method is not practical because it converges too slowly. 第 However,it is the basis for understanding the modern methods,which are always 10621 compared with it. Numerica Another classical method is the Gauss-Seide/method.which turns out to be 431 important in multigrid methods (819.6).Here we make use of updated values of u on Recipes the right-hand side of (19.5.5)as soon as they become available.In other words,the averaging is done"in place"instead of being"copied"from an earlier timestep to a later one.If we are proceeding along the rows,incrementing j for fixed l,we have 时-(g+++4+)- △2 4p.1 (19.5.6) This method is also slowly converging and only of theoretical interest when used by itself,but some analysis of it will be instructive. Let us look at the Jacobi and Gauss-Seidel methods in terms of the matrix splitting concept.We change notation and call ux,"to conform to standard matrix notation.To solve A·x=b (19.5.7)
864 Chapter 19. Partial Differential Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). where L represents some elliptic operator and ρ is the source term. Rewrite the equation as a diffusion equation, ∂u ∂t = Lu − ρ (19.5.2) An initial distribution u relaxes to an equilibrium solution as t → ∞. This equilibrium has all time derivatives vanishing. Therefore it is the solution of the original elliptic problem (19.5.1). We see that all the machinery of §19.2, on diffusive initial value equations, can be brought to bear on the solution of boundary value problems by relaxation methods. Let us apply this idea to our model problem (19.0.3). The diffusion equation is ∂u ∂t = ∂2u ∂x2 + ∂2u ∂y2 − ρ (19.5.3) If we use FTCS differencing (cf. equation 19.2.4), we get un+1 j,l = un j,l + ∆t ∆2 un j+1,l + un j−1,l + un j,l+1 + un j,l−1 − 4un j,l − ρj,l∆t (19.5.4) Recall from (19.2.6) that FTCS differencing is stable in one spatial dimension only if ∆t/∆2 ≤ 1 2 . In two dimensions this becomes ∆t/∆2 ≤ 1 4 . Suppose we try to take the largest possible timestep, and set ∆t = ∆2/4. Then equation (19.5.4) becomes un+1 j,l = 1 4 un j+1,l + un j−1,l + un j,l+1 + un j,l−1 − ∆2 4 ρj,l (19.5.5) Thus the algorithm consists of using the average of u at its four nearest-neighbor points on the grid (plus the contribution from the source). This procedure is then iterated until convergence. This method is in fact a classical method with origins dating back to the last century, called Jacobi’s method (not to be confused with the Jacobi method for eigenvalues). The method is not practical because it converges too slowly. However, it is the basis for understanding the modern methods, which are always compared with it. Another classical method is the Gauss-Seidel method, which turns out to be important in multigrid methods (§19.6). Here we make use of updated values of u on the right-hand side of (19.5.5) as soon as they become available. In other words, the averaging is done “in place” instead of being “copied” from an earlier timestep to a later one. If we are proceeding along the rows, incrementing j for fixed l, we have un+1 j,l = 1 4 un j+1,l + un+1 j−1,l + un j,l+1 + un+1 j,l−1 − ∆2 4 ρj,l (19.5.6) This method is also slowly converging and only of theoretical interest when used by itself, but some analysis of it will be instructive. Let us look at the Jacobi and Gauss-Seidel methods in terms of the matrix splitting concept. We change notation and call u “x,” to conform to standard matrix notation. To solve A · x = b (19.5.7)
19.5 Relaxation Methods for Boundary Value Problems 865 we can consider splitting A as A=L+D+U (19.5.8) where D is the diagonal part of A,L is the lower triangle of A with zeros on the diagonal,and U is the upper triangle of A with zeros on the diagonal. In the Jacobi method we write for the rth step of iteration D.xo)=-(L+U·xr-1)+b (19.5.9) 81 For our model problem(19.5.5),D is simply the identity matrix.The Jacobi method converges for matrices A that are"diagonally dominant"in a sense that can be made mathematically precise.For matrices arising from finite differencing,this condition is usually met. What is the rate of convergence of the Jacobi method?A detailed analysis is beyond our scope,but here is some of the flavor:The matrix-D-1.(L+U)is the iteration matrix which,apart from an additive term,maps one set of x's into the next.The iteration matrix has eigenvalues,each one of which reflects the factor by which the amplitude of a particular eigenmode of undesired residual is suppressed during one iteration.Evidently those factors had better all have modulus<1 for 号, 9 the relaxation to work at all!The rate of convergence of the method is set by the rate for the slowest-decaying eigenmode,i.e.,the factor with largest modulus.The modulus of this largest factor,therefore lying between 0 and 1,is called the spectral 9 radius of the relaxation operator,denoted ps The number of iterations r required to reduce the overall error by a factor 10-P is thus estimated by pln10 T≈ (19.5.10) (-Inps) In general,the spectral radius p goes asymptotically to the value 1 as the grid size J is increased,so that more iterations are required.For any given equation, Pa份Nw Numerica 10621 grid geometry,and boundary condition,the spectral radius can,in principle,be computed analytically.For example,for equation (19.5.5)on a Jx J grid with 431 Dirichlet boundary conditions on all four sides,the asymptotic formula for large Recipes Jturns out to be Ps≈1- 3 2.2 (19.5.11) The number of iterations r required to reduce the error by a factor of 10-P is thus 2p.J21n10 1 T2 J2 (19.5.12) In other words,the number of iterations is proportional to the number of mesh points, J2.Since 100 x 100 and larger problems are common,it is clear that the Jacobi method is only of theoretical interest
19.5 Relaxation Methods for Boundary Value Problems 865 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). we can consider splitting A as A = L + D + U (19.5.8) where D is the diagonal part of A, L is the lower triangle of A with zeros on the diagonal, and U is the upper triangle of A with zeros on the diagonal. In the Jacobi method we write for the rth step of iteration D · x(r) = −(L + U) · x(r−1) + b (19.5.9) For our model problem (19.5.5), D is simply the identity matrix. The Jacobi method converges for matrices A that are “diagonally dominant” in a sense that can be made mathematically precise. For matrices arising from finite differencing, this condition is usually met. What is the rate of convergence of the Jacobi method? A detailed analysis is beyond our scope, but here is some of the flavor: The matrix −D −1 · (L + U) is the iteration matrix which, apart from an additive term, maps one set of x’s into the next. The iteration matrix has eigenvalues, each one of which reflects the factor by which the amplitude of a particular eigenmode of undesired residual is suppressed during one iteration. Evidently those factors had better all have modulus < 1 for the relaxation to work at all! The rate of convergence of the method is set by the rate for the slowest-decaying eigenmode, i.e., the factor with largest modulus. The modulus of this largest factor, therefore lying between 0 and 1, is called the spectral radius of the relaxation operator, denoted ρ s. The number of iterations r required to reduce the overall error by a factor 10−p is thus estimated by r ≈ p ln 10 (− ln ρs) (19.5.10) In general, the spectral radius ρs goes asymptotically to the value 1 as the grid size J is increased, so that more iterations are required. For any given equation, grid geometry, and boundary condition, the spectral radius can, in principle, be computed analytically. For example, for equation (19.5.5) on a J × J grid with Dirichlet boundary conditions on all four sides, the asymptotic formula for large J turns out to be ρs 1 − π2 2J2 (19.5.11) The number of iterations r required to reduce the error by a factor of 10 −p is thus r 2pJ2 ln 10 π2 1 2 pJ2 (19.5.12) In other words, the number of iterations is proportional to the number of mesh points, J2. Since 100 × 100 and larger problems are common, it is clear that the Jacobi method is only of theoretical interest
866 Chapter 19.Partial Differential Equations The Gauss-Seidel method,equation (19.5.6),corresponds to the matrix de- composition (+D):x)=-Uxr-1)+b (19.5.13) The fact that L is on the left-hand side of the equation follows from the updating in place,as you can easily check if you write out(19.5.13)in components.One can show [1-3]that the spectral radius is just the square of the spectral radius of the Jacobi method.For our model problem,therefore, π2 p≈1- (19.5.14) rpJIn10 1 T2 (19.5.15) The factor of two improvement in the number of iterations over the Jacobi method still leaves the method impractical. RECIPESI 2 Successive Overrelaxation(SOR) We get a better algorithm-one that was the standard algorithm until the 1970s -if we make an overcorrection to the value of x(r)at the rth stage of Gauss-Seidel iteration,thus anticipating future corrections.Solve(19.5.13)for x(),add and 9 subtract x(r-1)on the right-hand side,and hence write the Gauss-Seidel method as xo)=xr-1)-(L+D)-1.[(L+D+U)·xr-)-b (19.5.16) 32 The term in square brackets is just the residual vector g(-1),so xr)=xr-1)-(L+D)-1.r-1) (19.5.17) Now overcorrect,defining xr)=xr-)-w(L+D)-1.r-1) (19.5.18) Numerical Here w is called the overrelaxation parameter,and the method is called successive -431 overrelaxation (SOR). (outside Recipes The following theorems can be proved [1-3]: .The method is convergent only for 0<w<2.If0<w<1,we speak of underrelaxation. North Under certain mathematical restrictions generally satisfied by matrices arising from finite differencing,only overrelaxation(1<w<2 )can give faster convergence than the Gauss-Seidel method. .If pJacobi is the spectral radius of the Jacobi iteration (so that the square of it is the spectral radius of the Gauss-Seidel iteration),then the optimal choice for w is given by 2 W= (19.5.19) 1+V1-Pjacobi
866 Chapter 19. Partial Differential Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). The Gauss-Seidel method, equation (19.5.6), corresponds to the matrix decomposition (L + D) · x(r) = −U · x(r−1) + b (19.5.13) The fact that L is on the left-hand side of the equation follows from the updating in place, as you can easily check if you write out (19.5.13) in components. One can show [1-3] that the spectral radius is just the square of the spectral radius of the Jacobi method. For our model problem, therefore, ρs 1 − π2 J2 (19.5.14) r pJ2 ln 10 π2 1 4 pJ2 (19.5.15) The factor of two improvement in the number of iterations over the Jacobi method still leaves the method impractical. Successive Overrelaxation (SOR) We get a better algorithm — one that was the standard algorithm until the 1970s — if we make an overcorrection to the value of x(r) at the rth stage of Gauss-Seidel iteration, thus anticipating future corrections. Solve (19.5.13) for x (r), add and subtract x(r−1) on the right-hand side, and hence write the Gauss-Seidel method as x(r) = x(r−1) − (L + D) −1 · [(L + D + U) · x(r−1) − b] (19.5.16) The term in square brackets is just the residual vector ξ (r−1), so x(r) = x(r−1) − (L + D) −1 · ξ(r−1) (19.5.17) Now overcorrect, defining x(r) = x(r−1) − ω(L + D) −1 · ξ(r−1) (19.5.18) Here ω is called the overrelaxation parameter, and the method is called successive overrelaxation (SOR). The following theorems can be proved [1-3]: • The method is convergent only for 0 <ω< 2. If 0 <ω< 1, we speak of underrelaxation. • Under certain mathematical restrictions generally satisfied by matrices arising from finite differencing, only overrelaxation (1 <ω< 2 ) can give faster convergence than the Gauss-Seidel method. • If ρJacobi is the spectral radius of the Jacobi iteration (so that the square of it is the spectral radius of the Gauss-Seidel iteration), then the optimal choice for ω is given by ω = 2 1 + 1 − ρ2 Jacobi (19.5.19)
19.5 Relaxation Methods for Boundary Value Problems 867 For this optimal choice,the spectral radius for SOR is PJacobi PSOR (19.5.20) As an application of the above results,consider our model problem for which PJacobi is given by equation (19.5.11).Then equations(19.5.19)and(19.5.20)give 2 w心1+刀 (19.5.21) pSOR≈1- 2π for large J (19.5.22) 菲 Equation(19.5.10)gives for the number of iterations to reduce the initial error by a factor of 10-P, ICAL 3 r≈pJln10.1 2π 3P (19.5.23) RECIPES Comparing with equation (19.5.12)or(19.5.15),we see that optimal SOR requires of orderJ iterations,as opposed to of orderJ2.Since J is typically 100 or larger, this makes a tremendous difference!Equation (19.5.23)leads to the mnemonic that 3-figure accuracy (p=3)requires a number of iterations equal to the number of mesh points along a side of the grid.For 6-figure accuracy,we require about twice as many iterations. as9色% How do we choose w for a problem for which the answer is not known analytically?That is just the weak point of SOR!The advantages of SOR obtain only in a fairly narrow window around the correct value of w.It is better to take w slightly too large,rather than slightly too small,but best to get it right. One way to choose w is to map your problem approximately onto a known problem,replacing the coefficients in the equation by average values.Note,however, that the known problem must have the same grid size and boundary conditions as the actual problem.We give for reference purposes the value of pJacobi for our model problem on a rectangularJ×L grid,allowing for the possibility that△x≠△y: Recipes Numerica 10.621 431 cos Recipes △ PJacobi (19.5.24) △x 1 △u North Equation(19.5.24)holds for homogeneous Dirichlet or Neumann boundary condi- tions.For periodic boundary conditions,make the replacement -2. A second way,which is especially useful if you plan to solve many similar elliptic equations each time with slightly different coefficients,is to determine the optimum value w empirically on the first equation and then use that value for the remaining equations.Various automated schemes for doing this and for "seeking out"the best values of w are described in the literature. While the matrix notation introduced earlier is useful for theoretical analyses, for practical implementation of the SOR algorithm we need explicit formulas
19.5 Relaxation Methods for Boundary Value Problems 867 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). • For this optimal choice, the spectral radius for SOR is ρSOR = ρJacobi 1 + 1 − ρ2 Jacobi 2 (19.5.20) As an application of the above results, consider our model problem for which ρJacobi is given by equation (19.5.11). Then equations (19.5.19) and (19.5.20) give ω 2 1 + π/J (19.5.21) ρSOR 1 − 2π J for large J (19.5.22) Equation (19.5.10) gives for the number of iterations to reduce the initial error by a factor of 10−p, r pJ ln 10 2π 1 3 pJ (19.5.23) Comparing with equation (19.5.12) or (19.5.15), we see that optimal SOR requires of order J iterations, as opposed to of order J 2. Since J is typically 100 or larger, this makes a tremendous difference! Equation (19.5.23) leads to the mnemonic that 3-figure accuracy (p = 3) requires a number of iterations equal to the number of mesh points along a side of the grid. For 6-figure accuracy, we require about twice as many iterations. How do we choose ω for a problem for which the answer is not known analytically? That is just the weak point of SOR! The advantages of SOR obtain only in a fairly narrow window around the correct value of ω. It is better to take ω slightly too large, rather than slightly too small, but best to get it right. One way to choose ω is to map your problem approximately onto a known problem, replacing the coefficients in the equation by average values. Note, however, that the known problem must have the same grid size and boundary conditions as the actual problem. We give for reference purposes the value of ρ Jacobi for our model problem on a rectangular J × L grid, allowing for the possibility that ∆x = ∆y: ρJacobi = cos π J + ∆x ∆y 2 cos π L 1 + ∆x ∆y 2 (19.5.24) Equation (19.5.24) holds for homogeneous Dirichlet or Neumann boundary conditions. For periodic boundary conditions, make the replacement π → 2π. A second way, which is especially useful if you plan to solve many similar elliptic equations each time with slightly different coefficients, is to determine the optimum value ω empirically on the first equation and then use that value for the remaining equations. Various automated schemes for doing this and for “seeking out” the best values of ω are described in the literature. While the matrix notation introduced earlier is useful for theoretical analyses, for practical implementation of the SOR algorithm we need explicit formulas
868 Chapter 19.Partial Differential Equations Consider a general second-order elliptic equation in z and y,finite differenced on a square as for our model equation.Corresponding to each row of the matrix A is an equation of the form aj.luj+i,l+bj.luj-1,1+cj.luj+1+dj,luj.1-1+ej.luj.l fj.l (19.5.25) For our model equation,we had a =b=c=d =1,e =-4.The quantity f is proportional to the source term.The iterative procedure is defined by solving (19.5.25)for uj,. 三 *l= (G1-a西+11-b西-1u-69+1-d4-) (19.5.26) eil Then u is a weighted average ugg"wru"jt+(1-w)ug (19.5.27) We calculate it as follows:The residual at any stage is ⊙】 RECIPES 5,1=a,luj+1,1+b5.14j-1,1+C3.1u,+1+4.1u.l-1+ej1u.1-f.1(19.5.28) 9 and the SOR algorithm (19.5.18)or (19.5.27)is 5"=H-w赳 (19.5.29) ei.l This formulation is very easy to program,and the norm of the residual vector j. IENTIFIC can be used as a criterion for terminating the iteration. Another practical point concerns the order in which mesh points are processed. The obvious strategy is simply to proceed in order down the rows (or columns). Alternatively,suppose we divide the mesh into"odd"and"even"meshes,like the red and black squares of a checkerboard.Then equation(19.5.26)shows that the odd points depend only on the even mesh values and vice versa.Accordingly, we can carry out one half-sweep updating the odd points,say,and then another half-sweep updating the even points with the new odd values.For the version of Numerica 10.621 SOR implemented below,we shall adopt odd-even ordering. 431 The last practical point is that in practice the asymptotic rate of convergence in SOR is not attained until of order J iterations.The error often grows by a factor of 20 before convergence sets in.A trivial modification to SOR resolves this (outside Recipes problem.It is based on the observation that,while w is the optimum asymptotic relaxation parameter,it is not necessarily a good initial choice.In SOR with North Chebyshev acceleration,one uses odd-even ordering and changes w at each half- sweep according to the following prescription: w(0)=1 w1/2)=1/(1-Pjacobi/2) wn+1/2)=1/1-p7 acobi(n)/4), (19.5.30) n=1/2,1,,0∞ w(oo)一Woptimal
868 Chapter 19. Partial Differential Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Consider a general second-order elliptic equation in x and y, finite differenced on a square as for our model equation. Corresponding to each row of the matrix A is an equation of the form aj,luj+1,l + bj,luj−1,l + cj,luj,l+1 + dj,luj,l−1 + ej,luj,l = fj,l (19.5.25) For our model equation, we had a = b = c = d = 1, e = −4. The quantity f is proportional to the source term. The iterative procedure is defined by solving (19.5.25) for uj,l: u*j,l = 1 ej,l (fj,l − aj,luj+1,l − bj,luj−1,l − cj,luj,l+1 − dj,luj,l−1) (19.5.26) Then unew j,l is a weighted average unew j,l = ωu*j,l + (1 − ω)uold j,l (19.5.27) We calculate it as follows: The residual at any stage is ξj,l = aj,luj+1,l + bj,luj−1,l + cj,luj,l+1 + dj,luj,l−1 + ej,luj,l − fj,l (19.5.28) and the SOR algorithm (19.5.18) or (19.5.27) is unew j,l = uold j,l − ω ξj,l ej,l (19.5.29) This formulation is very easy to program, and the norm of the residual vector ξ j,l can be used as a criterion for terminating the iteration. Another practical point concerns the order in which mesh points are processed. The obvious strategy is simply to proceed in order down the rows (or columns). Alternatively, suppose we divide the mesh into “odd” and “even” meshes, like the red and black squares of a checkerboard. Then equation (19.5.26) shows that the odd points depend only on the even mesh values and vice versa. Accordingly, we can carry out one half-sweep updating the odd points, say, and then another half-sweep updating the even points with the new odd values. For the version of SOR implemented below, we shall adopt odd-even ordering. The last practical point is that in practice the asymptotic rate of convergence in SOR is not attained until of order J iterations. The error often grows by a factor of 20 before convergence sets in. A trivial modification to SOR resolves this problem. It is based on the observation that, while ω is the optimum asymptotic relaxation parameter, it is not necessarily a good initial choice. In SOR with Chebyshev acceleration, one uses odd-even ordering and changes ω at each halfsweep according to the following prescription: ω(0) = 1 ω(1/2) = 1/(1 − ρ2 Jacobi/2) ω(n+1/2) = 1/(1 − ρ2 Jacobiω(n) /4), n = 1/2, 1, ..., ∞ ω(∞) → ωoptimal (19.5.30)
19.5 Relaxation Methods for Boundary Value Problems 869 The beauty of Chebyshev acceleration is that the norm of the error always decreases with each iteration.(This is the norm of the actual error in uj.t.The norm of the residual Ei.t need not decrease monotonically.)While the asymptotic rate of convergence is the same as ordinary SOR,there is never any excuse for not using Chebyshev acceleration to reduce the total number of iterations required. Here we give a routine for SOR with Chebyshev acceleration. #include #define MAXITS 1000 #define EPS 1.0e-5 83g granted for 19881992 void sor(double **a,double **b,double **c,double **d,double *e. double **f,double **u,int jmax,double rjac) Successive overrelaxation solution of equation (19.5.25)with Chebyshev acceleration.a,b,c. 1.200 d,e,and f are input as the coefficients of the equation,each dimensioned to the grid size [1..jmax][1..jmax].u is input as the initial guess to the solution,usually zero,and returns with the final value.rjac is input as the spectral radius of the Jacobi iteration,or an estimate from NUMERICAL RECIPES IN of it. void nrerror(char error.-text[☐); (Nort server computer, to make int ipass,j,jsw,1,1sw,n; double anorm,anormf=0.0,omega=1.0,resid; America one paper UnN电.t THE Double precision is a good idea for jmax bigger than about 25. ART for (j=2;j<jmax;j++) Compute initial norm of residual and terminate iteration when norm has been reduced by Progra a factor EPS. for(1=2;1<jmax;1+ for their anormf +fabs(f[j][l]): Assumes initial u is zero. for (n=1;n<=MAXITS;n++) anorm=0.0; jsw-1; for (ipass-1iipass<-2;ipass++){ Odd-even ordering. lsw=jsw; OF SCIENTIFIC COMPUTING(ISBN for (j=2;j<jmax;j++){ to directcustsen for(1=1sw+1:1<jmax:1+=2)C resid=a[j][1]*u[j+1][1] +b[j][1]*u[j-1][1] +c[j][1]*u[j][1+1] +d[j][1]*u[j][1-1] +e[][1]*u[j][1] @cambridge.org 1988-1992 by Numerical Recipes 10-:6211 43108 -f[j][1]; anorm +fabs(resid); u[j][1]-omega*resid/e[j][1]; lsw-3-lsw jsw=3-jsw; omega=(n =1&ipass =11.0/(1.0-0.5*rjac*rjac): (outside North America) Software. 1.0/(1.0-0.25*rjac*rjac*omega)); if (anorm EPS*anormf)return; nrerror("MAXITS exceeded"); The main advantage of SOR is that it is very easy to program.Its main disadvantage is that it is still very inefficient on large problems
19.5 Relaxation Methods for Boundary Value Problems 869 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). The beauty of Chebyshev acceleration is that the norm of the error always decreases with each iteration. (This is the norm of the actual error in uj,l. The norm of the residual ξj,l need not decrease monotonically.) While the asymptotic rate of convergence is the same as ordinary SOR, there is never any excuse for not using Chebyshev acceleration to reduce the total number of iterations required. Here we give a routine for SOR with Chebyshev acceleration. #include #define MAXITS 1000 #define EPS 1.0e-5 void sor(double **a, double **b, double **c, double **d, double **e, double **f, double **u, int jmax, double rjac) Successive overrelaxation solution of equation (19.5.25) with Chebyshev acceleration. a, b, c, d, e, and f are input as the coefficients of the equation, each dimensioned to the grid size [1..jmax][1..jmax]. u is input as the initial guess to the solution, usually zero, and returns with the final value. rjac is input as the spectral radius of the Jacobi iteration, or an estimate of it. { void nrerror(char error_text[]); int ipass,j,jsw,l,lsw,n; double anorm,anormf=0.0,omega=1.0,resid; Double precision is a good idea for jmax bigger than about 25. for (j=2;j<jmax;j++) Compute initial norm of residual and terminate iteration when norm has been reduced by a factor EPS. for (l=2;l<jmax;l++) anormf += fabs(f[j][l]); Assumes initial u is zero. for (n=1;n<=MAXITS;n++) { anorm=0.0; jsw=1; for (ipass=1;ipass<=2;ipass++) { Odd-even ordering. lsw=jsw; for (j=2;j<jmax;j++) { for (l=lsw+1;l<jmax;l+=2) { resid=a[j][l]*u[j+1][l] +b[j][l]*u[j-1][l] +c[j][l]*u[j][l+1] +d[j][l]*u[j][l-1] +e[j][l]*u[j][l] -f[j][l]; anorm += fabs(resid); u[j][l] -= omega*resid/e[j][l]; } lsw=3-lsw; } jsw=3-jsw; omega=(n == 1 && ipass == 1 ? 1.0/(1.0-0.5*rjac*rjac) : 1.0/(1.0-0.25*rjac*rjac*omega)); } if (anorm < EPS*anormf) return; } nrerror("MAXITS exceeded"); } The main advantage of SOR is that it is very easy to program. Its main disadvantage is that it is still very inefficient on large problems
870 Chapter 19.Partial Differential Equations ADI (Alternating-Direction Implicit)Method The ADI method of 819.3 for diffusion equations can be turned into a relaxation method for elliptic equations [1-41.In $19.3,we discussed ADI as a method for solving the time-dependent heat-flow equation 贺=-p (19.5.31) By letting too one also gets an iterative method for solving the elliptic equation 2u=p (19.5.32) In either case,the operator splitting is of the form ICAL C=Cr+Cy (19.5.33) where Cr represents the differencing in x and Cy that in y. For example,.in our model problem(lg.0.6)with△x=△y=△,we have Cru 2ujl uj+1.l-uj-1.l 2 (19.5.34) Cyu 2ujl -uj.1+1-uj.l-1 More complicated operators may be similarly split,but there is some art involved. > 9 A bad choice of splitting can lead to an algorithm that fails to converge.Usually one tries to base the splitting on the physical nature of the problem.We know for our model problem that an initial transient diffuses away,and we set up the x and y splitting to mimic diffusion in each dimension. Having chosen a splitting,we difference the time-dependent equation(19.5.31) implicitly in two half-steps: un+1/2-un Crun+1/2+Lyun Num 10.621 △t/2 42 -P (19.5.35) un+1-un+1/2 Crun+1/2+Cyun+i --p △t/2 △2 (cf.equation 19.3.16).Here we have suppressed the spatial indices (j,1).In matrix notation,equations (19.5.35)are (Lz+r1)u+1/2=(r1-Lg)”-△2p (19.5.36) (Ly+r1).u"+1=(r1-Lz).u+1/2-A2p (19.5.37) where 242 r三t (19.5.38) The matrices on the left-hand sides of equations (19.5.36)and (19.5.37)are tridiagonal (and usually positive definite),so the equations can be solved by the
870 Chapter 19. Partial Differential Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). ADI (Alternating-Direction Implicit) Method The ADI method of §19.3 for diffusion equations can be turned into a relaxation method for elliptic equations [1-4]. In §19.3, we discussed ADI as a method for solving the time-dependent heat-flow equation ∂u ∂t = ∇2u − ρ (19.5.31) By letting t → ∞ one also gets an iterative method for solving the elliptic equation ∇2u = ρ (19.5.32) In either case, the operator splitting is of the form L = Lx + Ly (19.5.33) where Lx represents the differencing in x and Ly that in y. For example, in our model problem (19.0.6) with ∆x = ∆y = ∆, we have Lxu = 2uj,l − uj+1,l − uj−1,l Lyu = 2uj,l − uj,l+1 − uj,l−1 (19.5.34) More complicated operators may be similarly split, but there is some art involved. A bad choice of splitting can lead to an algorithm that fails to converge. Usually one tries to base the splitting on the physical nature of the problem. We know for our model problem that an initial transient diffuses away, and we set up the x and y splitting to mimic diffusion in each dimension. Having chosen a splitting, we difference the time-dependent equation (19.5.31) implicitly in two half-steps: un+1/2 − un ∆t/2 = −Lxun+1/2 + Lyun ∆2 − ρ un+1 − un+1/2 ∆t/2 = −Lxun+1/2 + Lyun+1 ∆2 − ρ (19.5.35) (cf. equation 19.3.16). Here we have suppressed the spatial indices (j, l). In matrix notation, equations (19.5.35) are (Lx + r1) · un+1/2 = (r1 − Ly) · un − ∆2ρ (19.5.36) (Ly + r1) · un+1 = (r1 − Lx) · un+1/2 − ∆2ρ (19.5.37) where r ≡ 2∆2 ∆t (19.5.38) The matrices on the left-hand sides of equations (19.5.36) and (19.5.37) are tridiagonal (and usually positive definite), so the equations can be solved by the
19.6 Multigrid Methods for Boundary Value Problems 871 standard tridiagonal algorithm.Given u",one solves(19.5.36)for u+1/2,substitutes on the right-hand side of(19.5.37),and then solves for u+.The key question is how to choose the iteration parameter r,the analog of a choice of timestep for an initial value problem. As usual,the goal is to minimize the spectral radius of the iteration matrix. Although it is beyond our scope to go into details here,it turns out that,for the optimal choice of r,the ADI method has the same rate of convergence as SOR. The individual iteration steps in the ADI method are much more complicated than in SOR,so the ADI method would appear to be inferior.This is in fact true if we choose the same parameter r for every iteration step.However,it is possible to 81 choose a different r for each step.If this is done optimally,then ADI is generally more efficient than SOR.We refer you to the literature [1-4]for details. Our reason for not fully implementing ADI here is that,in most applications, it has been superseded by the multigrid methods described in the next section.Our advice is to use SOR for trivial problems (e.g.,20 x 20),or for solving a larger problem once only,where ease of programming outweighs expense of computer time.Occasionally,the sparse matrix methods of 82.7 are useful for solving a set of difference equations directly.For production solution of large elliptic problems, however,multigrid is now almost always the method of choice. 9 CITED REFERENCES AND FURTHER READING: Hockney,R.W.,and Eastwood,J.W.1981,Computer Simulation Using Particles (New York: McGraw-Hill),Chapter 6. 85总6 Young,D.M.1971,Iterative Solution of Large Linear Systems (New York:Academic Press).[1] Stoer,J.,and Bulirsch,R.1980,Introduction to Numerical Analysis(New York:Springer-Verlag). 558.3-8.6.2] Varga,R.S.1962,Matrix /terative Analysis (Englewood Cliffs,NJ:Prentice-Hall).[3] Spanier,J.1967,in Mathematical Methods for Digital Computers,Volume 2(New York:Wiley). Chapter 11.[4] 19.6 Multigrid Methods for Boundary Value Problems 、之花 61 Numerical 10.621 43108 Recipes Practical multigrid methods were first introduced in the 1970s by Brandt.These (outside methods can solve elliptic PDEs discretized on N grid points in O(N)operations North The "rapid"direct elliptic solvers discussed in $19.4 solve special kinds of elliptic equations in O(N log N)operations.The numerical coefficients in these estimates are such that multigrid methods are comparable to the rapid methods in execution speed.Unlike the rapid methods,however,the multigrid methods can solve general elliptic equations with nonconstant coefficients with hardly any loss in efficiency. Even nonlinear equations can be solved with comparable speed. Unfortunately there is not a single multigrid algorithm that solves all elliptic problems.Rather there is a multigrid technique that provides the framework for solving these problems.You have to adjust the various components of the algorithm within this framework to solve your specific problem.We can only give a brief
19.6 Multigrid Methods for Boundary Value Problems 871 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). standard tridiagonal algorithm. Given un, one solves (19.5.36) for un+1/2, substitutes on the right-hand side of (19.5.37), and then solves for u n+1. The key question is how to choose the iteration parameter r, the analog of a choice of timestep for an initial value problem. As usual, the goal is to minimize the spectral radius of the iteration matrix. Although it is beyond our scope to go into details here, it turns out that, for the optimal choice of r, the ADI method has the same rate of convergence as SOR. The individual iteration steps in the ADI method are much more complicated than in SOR, so the ADI method would appear to be inferior. This is in fact true if we choose the same parameter r for every iteration step. However, it is possible to choose a different r for each step. If this is done optimally, then ADI is generally more efficient than SOR. We refer you to the literature [1-4] for details. Our reason for not fully implementing ADI here is that, in most applications, it has been superseded by the multigrid methods described in the next section. Our advice is to use SOR for trivial problems (e.g., 20 × 20), or for solving a larger problem once only, where ease of programming outweighs expense of computer time. Occasionally, the sparse matrix methods of §2.7 are useful for solving a set of difference equations directly. For production solution of large elliptic problems, however, multigrid is now almost always the method of choice. CITED REFERENCES AND FURTHER READING: Hockney, R.W., and Eastwood, J.W. 1981, Computer Simulation Using Particles (New York: McGraw-Hill), Chapter 6. Young, D.M. 1971, Iterative Solution of Large Linear Systems (New York: Academic Press). [1] Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), §§8.3–8.6. [2] Varga, R.S. 1962, Matrix Iterative Analysis (Englewood Cliffs, NJ: Prentice-Hall). [3] Spanier, J. 1967, in Mathematical Methods for Digital Computers, Volume 2 (New York: Wiley), Chapter 11. [4] 19.6 Multigrid Methods for Boundary Value Problems Practical multigrid methods were first introduced in the 1970s by Brandt. These methods can solve elliptic PDEs discretized on N grid points in O(N) operations. The “rapid” direct elliptic solvers discussed in §19.4 solve special kinds of elliptic equations in O(N log N) operations. The numerical coefficients in these estimates are such that multigrid methods are comparable to the rapid methods in execution speed. Unlike the rapid methods, however, the multigrid methods can solve general elliptic equations with nonconstant coefficients with hardly any loss in efficiency. Even nonlinear equations can be solved with comparable speed. Unfortunately there is not a single multigrid algorithm that solves all elliptic problems. Rather there is a multigrid technique that provides the framework for solving these problems. You have to adjust the various components of the algorithm within this framework to solve your specific problem. We can only give a brief