2.7 Sparse Linear Systems 71 2.7 Sparse Linear Systems A system of linear equations is called sparse if only a relatively small number of its matrix elements aij are nonzero.It is wasteful to use general methods of linear algebra on such problems,because most of the O(N3)arithmetic operations devoted to solving the set of equations or inverting the matrix involve zero operands. Furthermore,you might wish to work problems so large as to tax your available memory space,and it is wasteful to reserve storage for unfruitful zero elements. Note that there are two distinct (and not always compatible)goals for any sparse matrix method:saving time and/or saving space. We have already considered one archetypal sparse form in $2.4,the band diagonal matrix.In the tridiagonal case,e.g.,we saw that it was possible to save both time (order N instead of N3)and space (order N instead of N2).The method of solution was not different in principle from the general method of LU decomposition;it was just applied cleverly,and with due attention to the bookkeeping of zero elements.Many practical schemes for dealing with sparse problems have this same character.They are fundamentally decomposition schemes,or else elimination schemes akin to Gauss-Jordan,but carefully optimized so as to minimize the number of so-called fill-ins,initially zero elements which must become nonzero during the solution process,and for which storage must be reserved. Direct methods for solving sparse equations,then,depend crucially on the precise pattern of sparsity of the matrix.Patterns that occur frequently,or that are useful as way-stations in the reduction of more general forms,already have special names and special methods of solution.We do not have space here for any detailed review of these.References listed at the end of this section will furnish you with an "in"to the specialized literature,and the following list of buzz words(and Figure 2.7.1)will at least let you hold your own at cocktail parties: ●tridiagonal band diagonal (or banded)with bandwidth M ·band triangular ·block diagonal 。block tridiagonal ●block triangular v@cambridge.org ·cyclic banded 客 OF SCIENTIFIC COMPUTING (ISBN 1988-1992 by Numerical Recipes 12-:621-43106-50 singly (or doubly)bordered block diagonal singly (or doubly)bordered block triangular singly (or doubly)bordered band diagonal (outside North Software. singly (or doubly)bordered band triangular ·other() You should also be aware of some of the special sparse forms that occur in the ,visit website machine solution of partial differential equations in two or more dimensions.See Chapter 19. If your particular pattern of sparsity is not a simple one,then you may wish to try an analyze/factorize/operate package,which automates the procedure of figuring out how fill-ins are to be minimized.The analyze stage is done once only for each pattern of sparsity.The factorize stage is done once for each particular matrix that fits the pattern.The operate stage is performed once for each right-hand side to
2.7 Sparse Linear Systems 71 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 2.7 Sparse Linear Systems A system of linear equations is called sparse if only a relatively small number of its matrix elements aij are nonzero. It is wasteful to use general methods of linear algebra on such problems, because most of the O(N 3) arithmetic operations devoted to solving the set of equations or inverting the matrix involve zero operands. Furthermore, you might wish to work problems so large as to tax your available memory space, and it is wasteful to reserve storage for unfruitful zero elements. Note that there are two distinct (and not always compatible) goals for any sparse matrix method: saving time and/or saving space. We have already considered one archetypal sparse form in §2.4, the band diagonal matrix. In the tridiagonal case, e.g., we saw that it was possible to save both time (order N instead of N 3) and space (order N instead of N 2). The method of solution was not different in principle from the general method of LU decomposition; it was just applied cleverly, and with due attention to the bookkeeping of zero elements. Many practical schemes for dealing with sparse problems have this same character. They are fundamentally decomposition schemes, or else elimination schemes akin to Gauss-Jordan, but carefully optimized so as to minimize the number of so-called fill-ins, initially zero elements which must become nonzero during the solution process, and for which storage must be reserved. Direct methods for solving sparse equations, then, depend crucially on the precise pattern of sparsity of the matrix. Patterns that occur frequently, or that are useful as way-stations in the reduction of more general forms, already have special names and special methods of solution. We do not have space here for any detailed review of these. References listed at the end of this section will furnish you with an “in” to the specialized literature, and the following list of buzz words (and Figure 2.7.1) will at least let you hold your own at cocktail parties: • tridiagonal • band diagonal (or banded) with bandwidth M • band triangular • block diagonal • block tridiagonal • block triangular • cyclic banded • singly (or doubly) bordered block diagonal • singly (or doubly) bordered block triangular • singly (or doubly) bordered band diagonal • singly (or doubly) bordered band triangular • other (!) You should also be aware of some of the special sparse forms that occur in the solution of partial differential equations in two or more dimensions. See Chapter 19. If your particular pattern of sparsity is not a simple one, then you may wish to try an analyze/factorize/operate package, which automates the procedure of figuring out how fill-ins are to be minimized. The analyze stage is done once only for each pattern of sparsity. The factorize stage is done once for each particular matrix that fits the pattern. The operate stage is performed once for each right-hand side to
72 Chapter 2.Solution of Linear Algebraic Equations zeros zeros zeros (a) (b) (c) Permission is Sample page 囚 囚 (d) (e) (f) http:/.nr.1-800-872-7423 (North America ony)orsend email to directcustserv@cambridge.org(outside North America) readable files (including this one)to any server computer,is strictly prohibited. granted for internet users to make one paper Copyright(C)1988-1992 by Cambridge University Press.Programs Copyright(C) copy for thei (g) (h) (i ▣ 1988-1992 by Numerical Recipes from NUMERICAL RECIPES IN C:THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) ▣ ▣ 9 0 0 0 (G) (k) Software. Figure 2.7.1.Some standard forms for sparse matrices.(a)Band diagonal;(b)block triangular,(c)block tridiagonal;(d)singly bordered block diagonal;(e)doubly bordered block diagonal;(f)singly bordered block triangular;(g)bordered band-triangular,(h)and (i)singly and doubly bordered band diagonal;(j) and (k)other!(after Tewarson)[1]. visit website ying of machine be used with the particular matrix.Consult [2,3]for references on this.The NAG library [4]has an analyze/factorize/operate capability.A substantial collection of routines for sparse matrix calculation is also available from IMSL [5]as the Yale Sparse Matrix Package [6]. You should be aware that the special order of interchanges and eliminations
72 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). (a) (b) (c) (d) (e) (f ) (g) (h) (i) ( j) (k) zeros zeros zeros Figure 2.7.1. Some standard forms for sparse matrices. (a) Band diagonal; (b) block triangular; (c) block tridiagonal; (d) singly bordered block diagonal; (e) doubly bordered block diagonal; (f) singly bordered block triangular; (g) bordered band-triangular; (h) and (i) singly and doubly bordered band diagonal; (j) and (k) other! (after Tewarson) [1]. be used with the particular matrix. Consult [2,3] for references on this. The NAG library [4] has an analyze/factorize/operate capability. A substantial collection of routines for sparse matrix calculation is also available from IMSL [5] as the Yale Sparse Matrix Package [6]. You should be aware that the special order of interchanges and eliminations
2.7 Sparse Linear Systems 73 prescribed by a sparse matrix method so as to minimize fill-ins and arithmetic operations,generally acts to decrease the method's numerical stability as compared to,e.g.,regular LU decomposition with pivoting.Scaling your problem so as to make its nonzero matrix elements have comparable magnitudes (if you can do it) will sometimes ameliorate this problem. In the remainder of this section,we present some concepts which are applicable to some general classes of sparse matrices,and which do not necessarily depend on details of the pattern of sparsity. Sherman-Morrison Formula Suppose that you have already obtained,by herculean effort,the inverse matrix A-1 of a square matrix A.Now you want to make a"small"change in A,for B example change one element aij,or a few elements,or one row,or one column. Is there any way of calculating the corresponding change in A-without repeating your difficult labors?Yes,if your change is of the form RECIPES A→(A+u⑧v) (2.7.1) for some vectors u and v.If u is a unit vector ei,then(2.7.1)adds the components ofv to the ith row.(Recall that u v is a matrix whose i,jth element is the product 荐o∽的门 9 of the ith component of u and the jth component of v.)If v is a unit vector ej,then (2.7.1)adds the components of u to the jth column.If both u and v are proportional to unit vectors ei and ej respectively,then a term is added only to the element a e3房3 The Sherman-Morrison formula gives the inverse (A+uv)-1,and is derived briefly as follows: (A+u⑧v)-1=(1+A-1.u⑧v)-1.A-1 6 =(1-A-1.u⑧v+A-1.u⑧v·A-1.u⑧v-.)·A-1 =A-1-A-1.u8v·A-1(1-入+λ2-.) =A1-A-1四⑧wA- 10.621 1+入 Numerica (2.7.2) 43106 where λ≡v·A-1.u (2.7.3) The second line of(2.7.2)is a formal power series expansion.In the third line,the associativity of outer and inner products is used to factor out the scalars A. The use of(2.7.2)is this:Given A-and the vectors u and v,we need only perform two matrix multiplications and a vector dot product, z≡A-1.uw三(A-1)T.v 入=V·Z (2.7.4) to get the desired change in the inverse A-1→A-1-Z8w (2.7.5) 1+入
2.7 Sparse Linear Systems 73 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). prescribed by a sparse matrix method so as to minimize fill-ins and arithmetic operations, generally acts to decrease the method’s numerical stability as compared to, e.g., regular LU decomposition with pivoting. Scaling your problem so as to make its nonzero matrix elements have comparable magnitudes (if you can do it) will sometimes ameliorate this problem. In the remainder of this section, we present some concepts which are applicable to some general classes of sparse matrices, and which do not necessarily depend on details of the pattern of sparsity. Sherman-Morrison Formula Suppose that you have already obtained, by herculean effort, the inverse matrix A−1 of a square matrix A. Now you want to make a “small” change in A, for example change one element aij , or a few elements, or one row, or one column. Is there any way of calculating the corresponding change in A −1 without repeating your difficult labors? Yes, if your change is of the form A → (A + u ⊗ v) (2.7.1) for some vectors u and v. If u is a unit vector ei, then (2.7.1) adds the components of v to the ith row. (Recall that u ⊗ v is a matrix whose i, jth element is the product of the ith component of u and the jth component of v.) If v is a unit vector e j , then (2.7.1) adds the components of u to the jth column. If both u and v are proportional to unit vectors ei and ej respectively, then a term is added only to the element aij . The Sherman-Morrison formula gives the inverse (A + u⊗ v)−1, and is derived briefly as follows: (A + u ⊗ v) −1 = (1 + A−1 · u ⊗ v) −1 · A−1 = (1 − A−1 · u ⊗ v + A−1 · u ⊗ v · A−1 · u ⊗ v − ...) · A−1 = A−1 − A−1 · u ⊗ v · A−1 (1 − λ + λ2 − ...) = A−1 − (A−1 · u) ⊗ (v · A−1) 1 + λ (2.7.2) where λ ≡ v · A−1 · u (2.7.3) The second line of (2.7.2) is a formal power series expansion. In the third line, the associativity of outer and inner products is used to factor out the scalars λ. The use of (2.7.2) is this: Given A−1 and the vectors u and v, we need only perform two matrix multiplications and a vector dot product, z ≡ A−1 · u w ≡ (A−1) T · v λ = v · z (2.7.4) to get the desired change in the inverse A−1 → A−1 − z ⊗ w 1 + λ (2.7.5)
74 Chapter 2.Solution of Linear Algebraic Equations The whole procedure requires only 3N2 multiplies and a like number of adds(an even smaller number if u or v is a unit vector). The Sherman-Morrison formula can be directly applied to a class of sparse problems.If you already have a fast way of calculating the inverse of A (e.g.,a tridiagonal matrix,or some other standard sparse form),then (2.7.4)-(2.7.5)allow you to build up to your related but more complicated form,adding for example a row or column at a time.Notice that you can apply the Sherman-Morrison formula more than once successively,using at each stage the most recent update of A-1 (equation 2.7.5).Of course,if you have to modify every row,then you are back to an N3 method.The constant in front of the N3 is only a few times worse than the 81 better direct methods,but you have deprived yourself of the stabilizing advantages of pivoting-so be careful. For some other sparse problems,the Sherman-Morrison formula cannot be directly applied for the simple reason that storage of the whole inverse matrix A-1 is not feasible.If you want to add only a single correction of the form uv, and solve the linear system (A+u图v)·x=b (2.7.6) RECIPESI 2 then you proceed as follows.Using the fast method that is presumed available for the matrix A,solve the two auxiliary problems 豆。。 Press. A·y=bA·z=u (2.7.7) for the vectors y and z.In terms of these, x=y- 1+w…az (2.7.8) SCIENTIFIC as we see by multiplying (2.7.2)on the right by b. Cyclic Tridiagonal Systems So-called cyclic tridiagonal systems occur quite frequently,and are a good example of how to use the Sherman-Morrison formula in the manner just described. The equations have the form 10-521 b1c10·.· T1 a2b2c2· T2 Numerical Recipes 43106 4。 (2.7.9) aN-1 bN-1 CN-1 CN- (outside L a 0 aN TN TN This is a tridiagonal system,except for the matrix elements a and B in the corners. North Forms like this are typically generated by finite-differencing differential equations with periodic boundary conditions ($19.4). We use the Sherman-Morrison formula,treating the system as tridiagonal plus a correction.In the notation of equation(2.7.6),define vectors u and v to be 1 0 0 (2.7.10) 0 0 ./
74 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). The whole procedure requires only 3N 2 multiplies and a like number of adds (an even smaller number if u or v is a unit vector). The Sherman-Morrison formula can be directly applied to a class of sparse problems. If you already have a fast way of calculating the inverse of A (e.g., a tridiagonal matrix, or some other standard sparse form), then (2.7.4)–(2.7.5) allow you to build up to your related but more complicated form, adding for example a row or column at a time. Notice that you can apply the Sherman-Morrison formula more than once successively, using at each stage the most recent update of A −1 (equation 2.7.5). Of course, if you have to modify every row, then you are back to an N3 method. The constant in front of the N 3 is only a few times worse than the better direct methods, but you have deprived yourself of the stabilizing advantages of pivoting — so be careful. For some other sparse problems, the Sherman-Morrison formula cannot be directly applied for the simple reason that storage of the whole inverse matrix A −1 is not feasible. If you want to add only a single correction of the form u ⊗ v, and solve the linear system (A + u ⊗ v) · x = b (2.7.6) then you proceed as follows. Using the fast method that is presumed available for the matrix A, solve the two auxiliary problems A · y = b A · z = u (2.7.7) for the vectors y and z. In terms of these, x = y − v · y 1+(v · z) z (2.7.8) as we see by multiplying (2.7.2) on the right by b. Cyclic Tridiagonal Systems So-called cyclic tridiagonal systems occur quite frequently, and are a good example of how to use the Sherman-Morrison formula in the manner just described. The equations have the form b1 c1 0 ··· β a2 b2 c2 ··· ··· ··· aN−1 bN−1 cN−1 α ··· 0 aN bN · x1 x2 ··· xN−1 xN = r1 r2 ··· rN−1 rN (2.7.9) This is a tridiagonal system, except for the matrix elements α and β in the corners. Forms like this are typically generated by finite-differencing differential equations with periodic boundary conditions (§19.4). We use the Sherman-Morrison formula, treating the system as tridiagonal plus a correction. In the notation of equation (2.7.6), define vectors u and v to be u = γ 0 . . . 0 α v = 1 0 . . . 0 β/γ (2.7.10)
2.7 Sparse Linear Systems 75 Here yy is arbitrary for the moment.Then the matrix A is the tridiagonal part of the matrix in(2.7.9),with two terms modified: b1=b1-Y, BN =bN-aB/ (2.7.11) We now solve equations(2.7.7)with the standard tridiagonal algorithm,and then get the solution from equation(2.7.8). The routine cyclic below implements this algorithm.We choose the arbitrary parameter y=-b to avoid loss of precision by subtraction in the first of equations 5常 (2.7.11).In the unlikely event that this causes loss of precision in the second of 81 these equations,you can make a different choice. #include "nrutil.h" void cyclic(float a[],float b[],float c],float alpha,float beta, float r[],float x[],unsigned long n) Solves for a vector x[1..n]the "cyclic"set of linear equations given by equation(2.7.9).a, b,c,and r are input vectors,all dimensioned as [1..n],while alpha and beta are the corner RECIPES I entries in the matrix.The input is not modified. (North void tridag(float a0,f1oatb☐,f1oatc[0,f1oatr0,f1oatu[▣, unsig即ed long n); unsigned long i; Ameri computer, Press. float fact,gamma,*bb,*u,*Z; ART if (n <2)nrerror("n too small in cyclic"); 9 Programs bb=vector(1,n); u=vector(1,n); z=vector(1,n); gamma =-b[1]; Avoid subtraction error in forming bb[1]. bb[1]=b[1]-gamma; Set up the diagonal of the modified tridi- bb[n]=b[n]-alpha*beta/gamma; agonal system. to dir for (i=2;i<n;i++)bb[i]=b[i]; tridag(a,bb,c,r,x,n); Solve A·x=r. u[1]=gamma; Set up the vector u. ectcustser u[n]=alpha; 18881920 SCIENTIFIC COMPUTING(ISBN for(1=2:1<n:1++)u[1]=0.0; tridag(a,bb,c,u,z,n); Solve A·z=u. fact=(x[1]+beta*x [n]/gamma)/ Form v·x/(1+v·z) 10-621 (1.0+z[1]+beta*z[n]/gamma); for(i=1:1<=n;1++)x[1]-=fact*z[i]: Now get the solution vector x. Further reproduction. Numerical Recipes 43108 free_vector(z,1,n); free_vector(u,1,n); free_vector(bb,1,n); (outside Software. Woodbury Formula If you want to add more than a single correction term,then you cannot use (2.7.8) repeatedly,since without storing a new A you will not be able to solve the auxiliary problems (2.7.7)efficiently after the first step.Instead,you need the Woodbury formula,which is the block-matrix version of the Sherman-Morrison formula. (a+U.Vr)-1 (2.7.12) =A-1-A-1.U(1+vr.A-1.U)-1.v.A-1
2.7 Sparse Linear Systems 75 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Here γ is arbitrary for the moment. Then the matrix A is the tridiagonal part of the matrix in (2.7.9), with two terms modified: b 1 = b1 − γ, b N = bN − αβ/γ (2.7.11) We now solve equations (2.7.7) with the standard tridiagonal algorithm, and then get the solution from equation (2.7.8). The routine cyclic below implements this algorithm. We choose the arbitrary parameter γ = −b1 to avoid loss of precision by subtraction in the first of equations (2.7.11). In the unlikely event that this causes loss of precision in the second of these equations, you can make a different choice. #include "nrutil.h" void cyclic(float a[], float b[], float c[], float alpha, float beta, float r[], float x[], unsigned long n) Solves for a vector x[1..n] the “cyclic” set of linear equations given by equation (2.7.9). a, b, c, and r are input vectors, all dimensioned as [1..n], while alpha and beta are the corner entries in the matrix. The input is not modified. { void tridag(float a[], float b[], float c[], float r[], float u[], unsigned long n); unsigned long i; float fact,gamma,*bb,*u,*z; if (n <= 2) nrerror("n too small in cyclic"); bb=vector(1,n); u=vector(1,n); z=vector(1,n); gamma = -b[1]; Avoid subtraction error in forming bb[1]. bb[1]=b[1]-gamma; Set up the diagonal of the modified tridibb[n]=b[n]-alpha*beta/gamma; agonal system. for (i=2;i<n;i++) bb[i]=b[i]; tridag(a,bb,c,r,x,n); Solve A · x = r. u[1]=gamma; Set up the vector u. u[n]=alpha; for (i=2;i<n;i++) u[i]=0.0; tridag(a,bb,c,u,z,n); Solve A · z = u. fact=(x[1]+beta*x[n]/gamma)/ Form v · x/(1 + v · z). (1.0+z[1]+beta*z[n]/gamma); for (i=1;i<=n;i++) x[i] -= fact*z[i]; Nowget the solution vector x. free_vector(z,1,n); free_vector(u,1,n); free_vector(bb,1,n); } Woodbury Formula If you want to add more than a single correction term, then you cannot use (2.7.8) repeatedly, since without storing a new A−1 you will not be able to solve the auxiliary problems (2.7.7) efficiently after the first step. Instead, you need the Woodbury formula, which is the block-matrix version of the Sherman-Morrison formula, (A + U · VT ) −1 = A−1 − A−1 · U · (1 + VT · A−1 · U) −1 · VT · A−1 (2.7.12)
76 Chapter 2.Solution of Linear Algebraic Equations Here A is,as usual,an N x N matrix,while U and V are N x P matrices with P<N and usually P N.The inner piece of the correction term may become clearer if written as the tableau, . (2.7.13) .com or where you can see that the matrix whose inverse is needed is only Px P rather than N x N. The relation between the Woodbury formula and successive applications of the Sherman- Morrison formula is now clarified by noting that,if U is the matrix formed by columns out of the P vectors u,...,up,and Vis the matrix formed by columns out of the Pvectors vi,...,vp, (2.7.14) 寻9 2 then two ways of expressing the same correction to A are Press. P A+∑uk⑧vk=(A+U.VT) (2.7.15) Program k=1 (Note that the subscripts on u and v do not denote components,but rather distinguish the different column vectors. Equation (2.7.15)reveals that,if you have A in storage,then you can either make the P corrections in one fell swoop by using (2.7.12),inverting a P x P matrix,or else make them by applying (2.7.5)P successive times. If you don't have storage for A-1,then you must use (2.7.12)in the following way: OF SCIENTIFIC COMPUTING(ISBN To solve the linear equation 草 198918920 x=b (2.7.16) 10-621 first solve the P auxiliary problems Numerical Recipes 43106 A·Z1=u1 A·Z2=u2 (outside (2.7.17) North Software. A·ZP=uP ing of and construct the matrix Z by columns from the z's obtained. ZP (2.7.18) Next,do the Px P matrix inversion H≡(1+vr.Z)-1 (2.7.19)
76 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Here A is, as usual, an N × N matrix, while U and V are N × P matrices with P <N and usually P N. The inner piece of the correction term may become clearer if written as the tableau, U · 1 + VT · A−1 · U −1 · VT (2.7.13) where you can see that the matrix whose inverse is needed is only P × P rather than N × N. The relation between the Woodbury formula and successive applications of the ShermanMorrison formula is now clarified by noting that, if U is the matrix formed by columns out of the P vectors u1,..., uP , and V is the matrix formed by columns out of the P vectors v1,..., vP , U ≡ u1 ··· uP V ≡ v1 ··· vP (2.7.14) then two ways of expressing the same correction to A are A + P k=1 uk ⊗ vk = (A + U · VT ) (2.7.15) (Note that the subscripts on u and v do not denote components, but rather distinguish the different column vectors.) Equation (2.7.15) reveals that, if you have A−1 in storage, then you can either make the P corrections in one fell swoop by using (2.7.12), inverting a P × P matrix, or else make them by applying (2.7.5) P successive times. If you don’t have storage for A−1, then you must use (2.7.12) in the following way: To solve the linear equation A + P k=1 uk ⊗ vk · x = b (2.7.16) first solve the P auxiliary problems A · z1 = u1 A · z2 = u2 ··· A · zP = uP (2.7.17) and construct the matrix Z by columns from the z’s obtained, Z ≡ z1 ··· zP (2.7.18) Next, do the P × P matrix inversion H ≡ (1 + VT · Z) −1 (2.7.19)
2.7 Sparse Linear Systems 77 Finally,solve the one further auxiliary problem A·y=b (2.7.20) In terms of these quantities,the solution is given by x=y-z.[H.(vT.y) (2.721) Inversion by Partitioning Once in a while,you will encounter a matrix(not even necessarily sparse) that can be inverted efficiently by partitioning.Suppose that the Nx N matrix A is partitioned into A= 「PQ R S (2.7.22) where P and S are square matrices of size p x p and s x s respectively (p+s=N). The matrices Q and R are not necessarily square,and have sizes p x s and s x p, RECIPES 2 respectively. If the inverse of A is partitioned in the same manner, A-1 (2.7.23) —判◆ 9 then P,Q,R,S,which have the same sizes as P,Q,R,S,respectively,can be found by either the formulas IENTIFIC P=(P-QS-1.R)-1 0=-(P-Qs-1.R)-1.(QS-1) (2.7.24) =-(S-1.R)(P-Q·S-1.R)-1 s=s-1+(S-1.R)(P-Q.s-1.R)-1.(Q.S-1) Numerical Recipes 021 ridge.org 43108 or else by the equivalent formulas P=P-1+P-1.Q)(S-RP-1.Q)-1.(R.P-1) (outside Q=-(P-1.Q)(S-R.P-1.Q)-1 North Software. (2.7.25) R=-(S-R.P-1.Q)-1.(R.P-1) s=(S-R.P-1.Q)-1 The parentheses in equations (2.7.24)and(2.7.25)highlight repeated factors that you may wish to compute only once.(Of course,by associativity,you can instead do the matrix multiplications in any order you like.)The choice between using equation(2.7.24)and(2.7.25)depends on whether you want P or S to have the simpler formula;or on whether the repeated expression(S-R.P-1.Q)-1 is easier
2.7 Sparse Linear Systems 77 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Finally, solve the one further auxiliary problem A · y = b (2.7.20) In terms of these quantities, the solution is given by x = y − Z · H · (VT · y) (2.7.21) Inversion by Partitioning Once in a while, you will encounter a matrix (not even necessarily sparse) that can be inverted efficiently by partitioning. Suppose that the N × N matrix A is partitioned into A = P Q R S (2.7.22) where P and S are square matrices of size p × p and s × s respectively (p + s = N). The matrices Q and R are not necessarily square, and have sizes p × s and s × p, respectively. If the inverse of A is partitioned in the same manner, A−1 = P Q R S (2.7.23) then P, Q, R, S, which have the same sizes as P, Q, R, S, respectively, can be found by either the formulas P = (P − Q · S−1 · R) −1 Q = −(P − Q · S−1 · R) −1 · (Q · S−1) R = −(S−1 · R) · (P − Q · S−1 · R) −1 S = S−1 + (S−1 · R) · (P − Q · S−1 · R) −1 · (Q · S−1) (2.7.24) or else by the equivalent formulas P = P−1 + (P−1 · Q) · (S − R · P−1 · Q) −1 · (R · P−1) Q = −(P−1 · Q) · (S − R · P−1 · Q) −1 R = −(S − R · P−1 · Q) −1 · (R · P−1) S = (S − R · P−1 · Q) −1 (2.7.25) The parentheses in equations (2.7.24) and (2.7.25) highlight repeated factors that you may wish to compute only once. (Of course, by associativity, you can instead do the matrix multiplications in any order you like.) The choice between using equation (2.7.24) and (2.7.25) depends on whether you want P or S to have the simpler formula; or on whether the repeated expression (S − R · P −1 · Q)−1 is easier
78 Chapter 2.Solution of Linear Algebraic Equations to calculate than the expression (P-Q.S-.R)-1;or on the relative sizes of P and S;or on whether P-or S-1 is already known. Another sometimes useful formula is for the determinant of the partitioned matrix, det A=det Pdet(S-R.P-.Q)=det Sdet(P-Q.S-.R) (2.7.26 Indexed Storage of Sparse Matrices h三4 83 We have already seen(82.4)that tri-or band-diagonal matrices can be stored in a compact 鱼 18881892 format that allocates storage only to elements which can be nonzero,plus perhaps a few wasted locations to make the bookkeeping easier.What about more general sparse matrices?When a 100 sparse matrix of dimension Nx N contains only a few times N nonzero elements(a typical case),it is surely inefficient-and often physically impossible-to allocate storage for all N2 elements.Even if one did allocate such storage,it would be inefficient or prohibitive in from NUMERICAL RECIPES I machine time to loop over all of it in search of nonzero elements. Obviously some kind of indexed storage scheme is required,one that stores only nonzero server matrix elements,along with sufficient auxiliary information to determine where an element (Nort to make logically belongs and how the various elements can be looped over in common matrix operations.Unfortunately,there is no one standard scheme in general use.Knuth[7]describes Americ computer UnN电.t THE one method.The Yale Sparse Matrix Package [6]and ITPACK [8]describe several other ART methods.For most applications,we favor the storage scheme used by PCGPACK [9],which is almost the same as that described by Bentley [10],and also similar to one of the Yale Sparse Matrix Package methods.The advantage of this scheme,which can be called row-indexed Programs sparse storage mode,is that it requires storage of only about two times the number of nonzero matrix elements.(Other methods can require as much as three or five times.For simplicity, we will treat only the case of square matrices,which occurs most frequently in practice. 会 To represent a matrix A of dimension N x N,the row-indexed scheme sets up two one-dimensional arrays,call them sa and ija.The first of these stores matrix element values to dir in single or double precision as desired:the second stores integer values.The storage rules are: The first N locations of sa store A's diagonal matrix elements,in order.(Note that OF SCIENTIFIC COMPUTING(ISBN diagonal elements are stored even if they are zero;this is at most a slight storage 1988-19920 inefficiency,since diagonal elements are nonzero in most realistic applications. Each of the first N locations of ija stores the index of the array sa that contains the first off-diagonal element of the corresponding row of the matrix.(If there are 10-621 no off-diagonal elements for that row,it is one greater than the index in sa of the most recently stored element of a previous row.) 43106 Location I of ija is always equal to N+2.(It can be read to determine N.) Numerical Recipes Location N+1 of ija is one greater than the index in sa of the last off-diagonal element of the last row.(It can be read to determine the number of nonzero (outside elements in the matrix,or the number of elements in the arrays sa and ija. Software. Location N+1 of sa is not used and can be set arbitrarily. Entries in sa at locations N+2 contain A's off-diagonal values,ordered by rows and,within each row,ordered by columns. Entries in ija at locations>N+2 contain the column number of the corresponding element in sa visit website machine While these rules seem arbitrary at first sight,they result in a rather elegant storage scheme.As an example,consider the matrix 「3.0.1.0. 0.1 0. 4. 0 0. 7. 5. 9 0. (2.7.27) 0. 2 0 0. 6
78 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). to calculate than the expression (P − Q · S−1 · R)−1; or on the relative sizes of P and S; or on whether P−1 or S−1 is already known. Another sometimes useful formula is for the determinant of the partitioned matrix, det A = det P det(S − R · P−1 · Q) = det S det(P − Q · S−1 · R) (2.7.26) Indexed Storage of Sparse Matrices We have already seen (§2.4) that tri- or band-diagonal matrices can be stored in a compact format that allocates storage only to elements which can be nonzero, plus perhaps a few wasted locations to make the bookkeeping easier. What about more general sparse matrices? When a sparse matrix of dimension N × N contains only a few times N nonzero elements (a typical case), it is surely inefficient — and often physically impossible — to allocate storage for all N2 elements. Even if one did allocate such storage, it would be inefficient or prohibitive in machine time to loop over all of it in search of nonzero elements. Obviously some kind of indexed storage scheme is required, one that stores only nonzero matrix elements, along with sufficient auxiliary information to determine where an element logically belongs and how the various elements can be looped over in common matrix operations. Unfortunately, there is no one standard scheme in general use. Knuth [7] describes one method. The Yale Sparse Matrix Package [6] and ITPACK [8] describe several other methods. For most applications, we favor the storage scheme used by PCGPACK [9], which is almost the same as that described by Bentley [10], and also similar to one of the Yale Sparse Matrix Package methods. The advantage of this scheme, which can be called row-indexed sparse storage mode, is that it requires storage of only about two times the number of nonzero matrix elements. (Other methods can require as much as three or five times.) For simplicity, we will treat only the case of square matrices, which occurs most frequently in practice. To represent a matrix A of dimension N × N, the row-indexed scheme sets up two one-dimensional arrays, call them sa and ija. The first of these stores matrix element values in single or double precision as desired; the second stores integer values. The storage rules are: • The first N locations of sa store A’s diagonal matrix elements, in order. (Note that diagonal elements are stored even if they are zero; this is at most a slight storage inefficiency, since diagonal elements are nonzero in most realistic applications.) • Each of the first N locations of ija stores the index of the array sa that contains the first off-diagonal element of the corresponding row of the matrix. (If there are no off-diagonal elements for that row, it is one greater than the index in sa of the most recently stored element of a previous row.) • Location 1 of ija is always equal to N + 2. (It can be read to determine N.) • Location N + 1 of ija is one greater than the index in sa of the last off-diagonal element of the last row. (It can be read to determine the number of nonzero elements in the matrix, or the number of elements in the arrays sa and ija.) Location N + 1 of sa is not used and can be set arbitrarily. • Entries in sa at locations ≥ N + 2 contain A’s off-diagonal values, ordered by rows and, within each row, ordered by columns. • Entries in ija at locations ≥ N+2 contain the column number of the corresponding element in sa. While these rules seem arbitrary at first sight, they result in a rather elegant storage scheme. As an example, consider the matrix 3. 0. 1. 0. 0. 0. 4. 0. 0. 0. 0. 7. 5. 9. 0. 0. 0. 0. 0. 2. 0. 0. 0. 6. 5. (2.7.27)
2.7 Sparse Linear Systems 79 In row-indexed compact storage,matrix(2.7.27)is represented by the two arrays of length 11,as follows index k 1 2 3 5 6 8 9 10 11 ija[k] 7 10 12 2 sa[k] 3. 4. 5. 0. 5. 1. 7. 9. 2. 6. (2.7.28) Here x is an arbitrary value.Notice that,according to the storage rules,the value of N (namely 5)is ija[1]-2,and the length of each array is ija[ija[1]-1]-1,namely 11. The diagonal element in row i is sa [i],and the off-diagonal elements in that row are in 8 sa[k]where k loops from ija[i]to ija[i+1]-1,if the upper limit is greater or equal to the lower one (as in C's for loops). Here is a routine,sprsin,that converts a matrix from full storage mode into row-indexed sparse storage mode,throwing away any elements that are less than a specified threshold. Of course,the principal use of sparse storage mode is for matrices whose full storage mode won't fit into your machine at all;then you have to generate them directly into sparse format. Nevertheless sprsin is useful as a precise algorithmic definition of the storage scheme,for 、名 subscale testing of large problems,and for the case where execution time,rather than storage, 6 furnishes the impetus to sparse storage. #include 、 Press. void sprsin(float **a,int n,float thresh,unsigned long nmax,float sa[], unsigned long ija[]) Converts a square matrix a[1..n][1..n]into row-indexed sparse storage mode.Only ele- ments of a with magnitude >thresh are retained.Output is in two linear arrays with dimen- Programs sion nmax (an input parameter):sa[1..contains array values,indexed by ija[1..]The number of elements filled of sa and ija on output are both ija[ija[1]-1]-1 (see text). SCIENTIFIC void nrerror(char error_text []) int i,j; unsigned long k; COMPUTING for (j=1;j=thresh &i !j){ if (++k nmax)nrerror("sprsin:nmax too small"); sa[k]=a[i][j]; Store off-diagonal elements and their columns. uction Numerical Recipes -43106 ija[k]=j; 2 (outside 1ja[1+1]=k+1; As each row is completed,store index to 22 next. North Software. 言 The single most important use of a matrix in row-indexed sparse storage mode is to multiply a vector to its right.In fact,the storage mode is optimized for just this purpose. The following routine is thus very simple. void sprsax(float sa],unsigned long ija[],float x[],float b[], unsigned long n) Multiply a matrix in row-index sparse storage arrays sa and ija by a vector x[1..n],giving a vector b[1..n]. void nrerror(char error-text☐);
2.7 Sparse Linear Systems 79 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). In row-indexed compact storage, matrix (2.7.27) is represented by the two arrays of length 11, as follows index k 1 2 3 4 5 6 7 8 9 10 11 ija[k] 7 8 8 10 11 12 3 2 4 5 4 sa[k] 3. 4. 5. 0. 5. x 1. 7. 9. 2. 6. (2.7.28) Here x is an arbitrary value. Notice that, according to the storage rules, the value of N (namely 5) is ija[1]-2, and the length of each array is ija[ija[1]-1]-1, namely 11. The diagonal element in row i is sa[i], and the off-diagonal elements in that row are in sa[k] where k loops from ija[i] to ija[i+1]-1, if the upper limit is greater or equal to the lower one (as in C’s for loops). Here is a routine, sprsin, that converts a matrix from full storage mode into row-indexed sparse storage mode, throwing away any elements that are less than a specified threshold. Of course, the principal use of sparse storage mode is for matrices whose full storage mode won’t fit into your machine at all; then you have to generate them directly into sparse format. Nevertheless sprsin is useful as a precise algorithmic definition of the storage scheme, for subscale testing of large problems, and for the case where execution time, rather than storage, furnishes the impetus to sparse storage. #include void sprsin(float **a, int n, float thresh, unsigned long nmax, float sa[], unsigned long ija[]) Converts a square matrix a[1..n][1..n] into row-indexed sparse storage mode. Only elements of a with magnitude ≥thresh are retained. Output is in two linear arrays with dimension nmax (an input parameter): sa[1..] contains array values, indexed by ija[1..]. The number of elements filled of sa and ija on output are both ija[ija[1]-1]-1 (see text). { void nrerror(char error_text[]); int i,j; unsigned long k; for (j=1;j= thresh && i != j) { if (++k > nmax) nrerror("sprsin: nmax too small"); sa[k]=a[i][j]; Store off-diagonal elements and their columns. ija[k]=j; } } ija[i+1]=k+1; As each rowis completed, store index to } next. } The single most important use of a matrix in row-indexed sparse storage mode is to multiply a vector to its right. In fact, the storage mode is optimized for just this purpose. The following routine is thus very simple. void sprsax(float sa[], unsigned long ija[], float x[], float b[], unsigned long n) Multiply a matrix in row-index sparse storage arrays sa and ija by a vector x[1..n], giving a vector b[1..n]. { void nrerror(char error_text[]);
80 Chapter 2.Solution of Linear Algebraic Equations unsigned long i,k; if (ija[1]!n+2)nrerror("sprsax:mismatched vector and matrix"); for(1=1:1<=n;i+){ b[i]-sa[i]*x[i]; Start with diagonal term. for (k=ija[i];k<=ija[i+1]-1;k++) Loop over off-diagonal terms. b[i]+sa[k]*x[ija[k]]; It is also simple to multiply the transpose of a matrix by a vector to its right.(We will use this operation later in this section.)Note that the transpose matrix is not actually constructed. void sprstx(float sa],unsigned long ija[],float x[],float b[], nted for unsigned long n) Multiply the transpose of a matrix in row-index sparse storage arrays sa and ija by a vector x[1..n],giving a vector b[1..n]. f void nrerror(char error_text []) unsigned long i,j,ki RECIPES if (ija[1]!n+2)nrerror("mismatched vector and matrix in sprstx"); for (i=1;i<=n;i++)b[i]=sa[i]*x[i]; Start with diagonal terms. for(1=1;1<=n;i++){ Loop over off-diagonal terms. for (k=ija[i];k<=ija[i+1]-1;k++){ Americ computer, Press. j=ija[k]; ART b[j]sa[k]*x[i]; 9 Programs (Double precision versions of sprsax and sprstx,named dsprsax and dsprstx,are used by the routine atimes later in this section.You can easily make the conversion,or else get the converted routines from the Numerical Recipes diskettes.) In fact,because the choice of row-indexed storage treats rows and columns quite differently,it is quite an involved operation to construct the transpose of a matrix,given the SCIENTIFIC COMPUTING(ISBN matrix itself in row-indexed sparse storage mode.When the operation cannot be avoided,it 19891892 is done as follows:An index of all off-diagonal elements by their columns is constructed (see $8.4).The elements are then written to the output array in column order.As each element is written,its row is determined and stored.Finally,the elements in each column are sorted by row. FuurgPgoglrion Numerica 10621 void sprstp(float sa[],unsigned long ija[],float sb[],unsigned long ijb[]) (outside Recipes 43106 Construct the transpose of a sparse square matrix,from row-index sparse storage arrays sa and ija into arrays sb and ijb. Software. void iindexx(unsigned long n,long arr[],unsigned long indx[]); Version of indexx with all float variables changed to long. unsigned long j,jl,jm,jp,ju,k,m,n2,noff,inc,iv; float vi n2=ija[1]; Linear size of matrix plus 2. for (j=1;j<=n2-2;j++)sb[j]=sa[j]; Diagonal elements. iindexx(ija[n2-1]-ija[1],(long *)&ija[n2-1],&ijb[n2-1]); Index all off-diagonal elements by their columns. jp=0; for(k=ija[1];k<=1ja[n2-1]-1;k++)[ Loop over output off-diagonal elements. m=1jb[k]+n2-1; Use index table to store by (former)columns. sb[k]=sa[m]; for (j=jp+1;j<=ija[m];j++)ijb[j]=k; Fill in the index to any omitted rows
80 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). unsigned long i,k; if (ija[1] != n+2) nrerror("sprsax: mismatched vector and matrix"); for (i=1;i<=n;i++) { b[i]=sa[i]*x[i]; Start with diagonal term. for (k=ija[i];k<=ija[i+1]-1;k++) Loop over off-diagonal terms. b[i] += sa[k]*x[ija[k]]; } } It is also simple to multiply the transpose of a matrix by a vector to its right. (We will use this operation later in this section.) Note that the transpose matrix is not actually constructed. void sprstx(float sa[], unsigned long ija[], float x[], float b[], unsigned long n) Multiply the transpose of a matrix in row-index sparse storage arrays sa and ija by a vector x[1..n], giving a vector b[1..n]. { void nrerror(char error_text[]); unsigned long i,j,k; if (ija[1] != n+2) nrerror("mismatched vector and matrix in sprstx"); for (i=1;i<=n;i++) b[i]=sa[i]*x[i]; Start with diagonal terms. for (i=1;i<=n;i++) { Loop over off-diagonal terms. for (k=ija[i];k<=ija[i+1]-1;k++) { j=ija[k]; b[j] += sa[k]*x[i]; } } } (Double precision versions of sprsax and sprstx, named dsprsax and dsprstx, are used by the routine atimes later in this section. You can easily make the conversion, or else get the converted routines from the Numerical Recipes diskettes.) In fact, because the choice of row-indexed storage treats rows and columns quite differently, it is quite an involved operation to construct the transpose of a matrix, given the matrix itself in row-indexed sparse storage mode. When the operation cannot be avoided, it is done as follows: An index of all off-diagonal elements by their columns is constructed (see §8.4). The elements are then written to the output array in column order. As each element is written, its row is determined and stored. Finally, the elements in each column are sorted by row. void sprstp(float sa[], unsigned long ija[], float sb[], unsigned long ijb[]) Construct the transpose of a sparse square matrix, from row-index sparse storage arrays sa and ija into arrays sb and ijb. { void iindexx(unsigned long n, long arr[], unsigned long indx[]); Version of indexx with all float variables changed to long. unsigned long j,jl,jm,jp,ju,k,m,n2,noff,inc,iv; float v; n2=ija[1]; Linear size of matrix plus 2. for (j=1;j<=n2-2;j++) sb[j]=sa[j]; Diagonal elements. iindexx(ija[n2-1]-ija[1],(long *)&ija[n2-1],&ijb[n2-1]); Index all off-diagonal elements by their columns. jp=0; for (k=ija[1];k<=ija[n2-1]-1;k++) { Loop over output off-diagonal elements. m=ijb[k]+n2-1; Use index table to store by (former) columns. sb[k]=sa[m]; for (j=jp+1;j<=ija[m];j++) ijb[j]=k; Fill in the index to any omitted rows