2.5 Iterative Improvement of a Solution to Linear Equations 55 x+δx b+ob 83 旦 granted for 11800 (including this one) /Cambridge from NUMERICAL RECIPES IN 19881992 Figure 2.5.1.Iterative improvement of the solution to A.x b.The first guess x +6x is multiplied by A to produce b+5b.The known vector b is subtracted,giving b.The linear set with this right-hand (Nort server side is inverted,giving &x.This is subtracted from the first guess giving an improved solution x. America computer, users to make one paper UnN电.t THE ART 2.5 Iterative Improvement of a Solution to Linear Equations 9 ictly proh Progra Obviously it is not easy to obtain greater precision for the solution of a linear set than the precision of your computer's floating-point word.Unfortunately,for to dir large sets of linear equations,it is not always easy to obtain precision equal to,or even comparable to,the computer's limit.In direct methods of solution,roundoff OF SCIENTIFIC COMPUTING(ISBN errors accumulate,and they are magnified to the extent that your matrix is close 1988-1992 to singular.You can easily lose two or three significant figures for matrices which (you thought)were far from singular. 10-621 If this happens to you,there is a neat trick to restore the full machine precision, 43106 called iterative improvement of the solution.The theory is very straightforward(see Numerical Recipes Figure 2.5.1):Suppose that a vector x is the exact solution of the linear set (outside A·x=b (2.5.1) Software. You don't,however,know x.You only know some slightly wrong solution x +ox, where &x is the unknownerror.When multiplied by the matrix A,your slightly wrong visit website solution gives a product slightly discrepant from the desired right-hand side b,namely machine A·(x+6x)=b+b (2.5.2) Subtracting (2.5.1)from (2.5.2)gives A·6x=6b (2.5.3)
2.5 Iterative Improvement of a Solution to Linear Equations 55 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). A A−1 δx x + δx x b b + δb δb Figure 2.5.1. Iterative improvement of the solution to A · x = b. The first guess x + δx is multiplied by A to produce b + δb. The known vector b is subtracted, giving δb. The linear set with this right-hand side is inverted, giving δx. This is subtracted from the first guess giving an improved solution x. 2.5 Iterative Improvement of a Solution to Linear Equations Obviously it is not easy to obtain greater precision for the solution of a linear set than the precision of your computer’s floating-point word. Unfortunately, for large sets of linear equations, it is not always easy to obtain precision equal to, or even comparable to, the computer’s limit. In direct methods of solution, roundoff errors accumulate, and they are magnified to the extent that your matrix is close to singular. You can easily lose two or three significant figures for matrices which (you thought) were far from singular. If this happens to you, there is a neat trick to restore the full machine precision, called iterative improvement of the solution. The theory is very straightforward (see Figure 2.5.1): Suppose that a vector x is the exact solution of the linear set A · x = b (2.5.1) You don’t, however, know x. You only know some slightly wrong solution x + δx, where δx is the unknown error. When multiplied by the matrix A, your slightly wrong solution gives a product slightly discrepant from the desired right-hand side b, namely A · (x + δx) = b + δb (2.5.2) Subtracting (2.5.1) from (2.5.2) gives A · δx = δb (2.5.3)
56 Chapter 2.Solution of Linear Algebraic Equations But(2.5.2)can also be solved,trivially,for 6b.Substituting this into(2.5.3)gives A·6x=A·(x+6x)-b (2.5.4) In this equation,the whole right-hand side is known,since x+ox is the wrong solution that you want to improve.It is essential to calculate the right-hand side in double precision,since there will be a lot of cancellation in the subtraction of b. Then,we need only solve(2.5.4)for the error 6x,then subtract this from the wrong solution to get an improved solution. An important extra benefit occurs if we obtained the original solution by LU 8 decomposition.In this case we already have the LU decomposed form of A,and all we need do to solve (2.5.4)is compute the right-hand side and backsubstitute! The code to do all this is concise and straightforward: 程 #include "nrutil.h" void mprove(float **a,float **alud,int n,int indx[],float b[],float x[]) Improves a solution vector x[1..n]of the linear set of equations A.X =B.The matrix a[1..n][1..n],and the vectors b[1..n]and x[1..n]are input,as is the dimension n. ⊙ RECIPES I Also input is alud[1..n][1..n],the LU decomposition of a as returned by ludcmp,and the vector indx[1..n]also returned by that routine.On output,only x[1..n]is modified, to an improved set of values. .字、2 令 Press. void lubksb(float **a,int n,int *indx,float b[]); int j,i; double sdp; float ri r-vector(1,n); SCIENTIFIC for (i=1;i<=n;i++){ Calculate the right-hand side,accumulating sdp =-b[i]; the residual in double precision. for (j=1;j<=n;j++)sdp +a[i][j]*x[j]; r[i]=sdp; lubksb(alud,n,indx,r); Solve for the error term, for (i=1;i<=n;i++)x[i]-r[i]; and subtract it from the old solution. 1920 COMPUTING (ISBN free_vector(r,1,n); Numerica 10.621 You should note that the routine ludcmp in $2.3 destroys the input matrix as Recipes 43106 it LU decomposes it.Since iterative improvement requires both the original matrix and its LU decomposition,you will need to copy A before calling ludcmp.Likewise lubksb destroys b in obtaining x,so make a copy of b also.If you don't mind (outside this extra storage,iterative improvement is highly recommended:It is a process North of order only N2 operations(multiply vector by matrix,and backsubstitute-see discussion following equation 2.3.7);it never hurts,and it can really give you your money's worth if it saves an otherwise ruined solution on which you have already spent of order N3 operations. You can call mprove several times in succession if you want.Unless you are starting quite far from the true solution,one call is generally enough;but a second call to verify convergence can be reassuring
56 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). But (2.5.2) can also be solved, trivially, for δb. Substituting this into (2.5.3) gives A · δx = A · (x + δx) − b (2.5.4) In this equation, the whole right-hand side is known, since x + δx is the wrong solution that you want to improve. It is essential to calculate the right-hand side in double precision, since there will be a lot of cancellation in the subtraction of b. Then, we need only solve (2.5.4) for the error δx, then subtract this from the wrong solution to get an improved solution. An important extra benefit occurs if we obtained the original solution by LU decomposition. In this case we already have the LU decomposed form of A, and all we need do to solve (2.5.4) is compute the right-hand side and backsubstitute! The code to do all this is concise and straightforward: #include "nrutil.h" void mprove(float **a, float **alud, int n, int indx[], float b[], float x[]) Improves a solution vector x[1..n] of the linear set of equations A · X = B. The matrix a[1..n][1..n], and the vectors b[1..n] and x[1..n] are input, as is the dimension n. Also input is alud[1..n][1..n], the LU decomposition of a as returned by ludcmp, and the vector indx[1..n] also returned by that routine. On output, only x[1..n] is modified, to an improved set of values. { void lubksb(float **a, int n, int *indx, float b[]); int j,i; double sdp; float *r; r=vector(1,n); for (i=1;i<=n;i++) { Calculate the right-hand side, accumulating sdp = -b[i]; the residual in double precision. for (j=1;j<=n;j++) sdp += a[i][j]*x[j]; r[i]=sdp; } lubksb(alud,n,indx,r); Solve for the error term, for (i=1;i<=n;i++) x[i] -= r[i]; and subtract it from the old solution. free_vector(r,1,n); } You should note that the routine ludcmp in §2.3 destroys the input matrix as it LU decomposes it. Since iterative improvement requires both the original matrix and its LU decomposition, you will need to copy A before calling ludcmp. Likewise lubksb destroys b in obtaining x, so make a copy of b also. If you don’t mind this extra storage, iterative improvement is highly recommended: It is a process of order only N 2 operations (multiply vector by matrix, and backsubstitute — see discussion following equation 2.3.7); it never hurts; and it can really give you your money’s worth if it saves an otherwise ruined solution on which you have already spent of order N 3 operations. You can call mprove several times in succession if you want. Unless you are starting quite far from the true solution, one call is generally enough; but a second call to verify convergence can be reassuring
2.5 Iterative Improvement of a Solution to Linear Equations 57 More on Iterative Improvement It is illuminating (and will be useful later in the book)to give a somewhat more solid analytical foundation for equation(2.5.4),and also to give some additional results.Implicit in the previous discussion was the notion that the solution vector x+ox has an error term;but we neglected the fact that the LU decomposition of A is itself not exact. A different analytical approach starts with some matrix Bo that is assumed to be an approximate inverse of the matrix A,so that Bo A is approximately the identity matrix 1. Define the residual matrix R of Bo as R三1-B0·A (2.5.5) which is supposed to be"small"(we will be more precise below).Note that therefore 81 B0·A=1-R (2.5.6) Next consider the following formal manipulation: A-1=A-1.(B1.B0)=(A-1.B61)B0=(B0·A)-1.B0 (2.5.7) =(1-R)-1.B0=(1+R+R2+R3+)·B0 We can define the nth partial sum of the last expression by Bm三(1+R+··+R”)·Bo (2.5.8) ◆ so that BeA-1,if the limit exists. It now is straightforward to verify that equation (2.5.8)satisfies some interesting recurrence relations.As regards solving A.x b,where x and b are vectors,define 9 9 xn三Bn-b (2.5.9) Then it is easy to show that xn+1=Xn+Bo~(b-A·Xn) (2.5.10) 6 This is immediately recognizable as equation(2.5.4),with-6x=x+-n,and with Bo taking the role of A-.We see,therefore,that equation (2.5.4)does not require that the LU decomposition of A be exact,but only that the implied residual R be small.In rough terms,if the residual is smaller than the square root of your computer's roundoff error,then after one application of equation(2.5.10)(that is,going from xo Bo.b to x1)the first neglected term, of order R2,will be smaller than the roundoff error.Equation (2.5.10),like equation (2.5.4), 、p艾 Numerica 10621 moreover,can be applied more than once,since it uses only Bo,and not any of the higher B's. A much more surprising recurrence which follows from equation (2.5.8)is one that more than doubles the order n at each stage: 431 B2m+1=2Bm-Bn·A·Bnn=0,1,3,7, (2.5.11) Recipes Repeated application of equation (2.5.11),from a suitable starting matrix Bo,converges quadratically to the unknown inverse matrix A-1(see $9.4 for the definition of"quadrati- North cally).Equation (2.5.11)goes by various names,including Schultz's Method and Hotelling's Method,see Pan and Reif[1]for references.In fact,equation(2.5.11)is simply the iterative Newton-Raphson method of root-finding(89.4)applied to matrix inversion. Before you get too excited about equation(2.5.11),however,you should notice that it involves two full matrix multiplications at each iteration.Each matrix multiplication involves N3 adds and multiplies.But we already saw in 882.1-2.3 that direct inversion of A requires only N3 adds and N3 multiplies in toto.Equation(2.5.11)is therefore practical only when special circumstances allow it to be evaluated much more rapidly than is the case for general matrices.We will meet such circumstances later,in 813.10. In the spirit of delayed gratification,let us nevertheless pursue the two related issues: When does the series in equation(2.5.7)converge;and what is a suitable initial guess Bo(if, for example,an initial LU decomposition is not feasible)?
2.5 Iterative Improvement of a Solution to Linear Equations 57 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). More on Iterative Improvement It is illuminating (and will be useful later in the book) to give a somewhat more solid analytical foundation for equation (2.5.4), and also to give some additional results. Implicit in the previous discussion was the notion that the solution vector x + δx has an error term; but we neglected the fact that the LU decomposition of A is itself not exact. A different analytical approach starts with some matrix B0 that is assumed to be an approximate inverse of the matrix A, so that B0 · A is approximately the identity matrix 1. Define the residual matrix R of B0 as R ≡ 1 − B0 · A (2.5.5) which is supposed to be “small” (we will be more precise below). Note that therefore B0 · A = 1 − R (2.5.6) Next consider the following formal manipulation: A−1 = A−1 · (B−1 0 · B0)=(A−1 · B−1 0 ) · B0 = (B0 · A) −1 · B0 = (1 − R) −1 · B0 = (1 + R + R2 + R3 + ···) · B0 (2.5.7) We can define the nth partial sum of the last expression by Bn ≡ (1 + R + ··· + Rn) · B0 (2.5.8) so that B∞ → A−1, if the limit exists. It now is straightforward to verify that equation (2.5.8) satisfies some interesting recurrence relations. As regards solving A · x = b, where x and b are vectors, define xn ≡ Bn · b (2.5.9) Then it is easy to show that xn+1 = xn + B0 · (b − A · xn) (2.5.10) This is immediately recognizable as equation (2.5.4), with −δx = xn+1 − xn, and with B0 taking the role of A−1. We see, therefore, that equation (2.5.4) does not require that the LU decomposition of A be exact, but only that the implied residual R be small. In rough terms, if the residual is smaller than the square root of your computer’s roundoff error, then after one application of equation (2.5.10) (that is, going from x0 ≡ B0 · b to x1) the first neglected term, of order R2, will be smaller than the roundoff error. Equation (2.5.10), like equation (2.5.4), moreover, can be applied more than once, since it uses only B0, and not any of the higher B’s. A much more surprising recurrence which follows from equation (2.5.8) is one that more than doubles the order n at each stage: B2n+1 = 2Bn − Bn · A · Bn n = 0, 1, 3, 7,... (2.5.11) Repeated application of equation (2.5.11), from a suitable starting matrix B0, converges quadratically to the unknown inverse matrix A−1 (see §9.4 for the definition of “quadratically”). Equation (2.5.11) goes by various names, including Schultz’s Method and Hotelling’s Method; see Pan and Reif [1] for references. In fact, equation (2.5.11) is simply the iterative Newton-Raphson method of root-finding (§9.4) applied to matrix inversion. Before you get too excited about equation (2.5.11), however, you should notice that it involves two full matrix multiplications at each iteration. Each matrix multiplication involves N3 adds and multiplies. But we already saw in §§2.1–2.3 that direct inversion of A requires only N3 adds and N3 multiplies in toto. Equation (2.5.11) is therefore practical only when special circumstances allow it to be evaluated much more rapidly than is the case for general matrices. We will meet such circumstances later, in §13.10. In the spirit of delayed gratification, let us nevertheless pursue the two related issues: When does the series in equation (2.5.7) converge; and what is a suitable initial guess B0 (if, for example, an initial LU decomposition is not feasible)?
58 Chapter 2.Solution of Linear Algebraic Equations We can define the norm of a matrix as the largest amplification of length that it is able to induce on a vector, R·V IRl三2路 (2.5.12) If we let equation(2.5.7)act on some arbitrary right-hand side b,as one wants a matrix inverse to do,it is obvious that a sufficient condition for convergence is IR<1 (2.5.13) Pan and Reif[1]point out that a suitable initial guess for Bo is any sufficiently small constant e times the matrix transpose of A,that is, 81 Bo=AT or R=1-EAT·A (2.5.14) To see why this is so involves concepts from Chapter 11;we give here only the briefest sketch: AT.A is a symmetric,positive definite matrix,so it has real,positive eigenvalues.In its diagonal representation,R takes the form 二孟 R=diag(1-e入1,1-e入2,.,1-e入w) (2.5.15) RECIPES where all the Ai's are positive.Evidently any e satisfying 0<e<2/(maxi A:)will give R<1.It is not difficult to show that the optimal choice for e,giving the most rapid convergence for equation (2.5.11),is e=2/(max入i+min入i) (2.5.16) Press. Rarely does one know the eigenvalues of AT.A in equation(2.5.16).Pan and Reif 9 derive several interesting bounds,which are computable directly from A.The following choices guarantee the convergence of Bn as n-oo, IENTIFIC e≤1/∑ax or e≤1 max∑lal×max∑lal (2.5.17) 6 j.k 1 The latter expression is truly a remarkable formula,which Pan and Reif derive by noting that the vector norm in equation (2.5.12)need not be the usual L2 norm,but can instead be either the L(max)norm,or the L (absolute value)norm.See their work for details. Another approach,with which we have had some success,is to estimate the largest eigenvalue statistically,by calculating s=Avi for several unit vector vi's with randomly chosen directions in N-space.The largest eigenvalue A can then be bounded by the maximum of 2 max s;and 2NVar(si)/u(s),where Var and u denote the sample variance and mean, Numerica 10621 respectively. uctio 43108 Recipes CITED REFERENCES AND FURTHER READING: Johnson,L.W.,and Riess,R.D.1982,Numerical Analysis,2nd ed.(Reading,MA:Addison- North Nesley),s2.3.4,p.55. Golub,G.H.,and Van Loan,C.F.1989,Matrix Computations,2nd ed.(Baltimore:Johns Hopkins University Press),p.74. Dahlquist,G.,and Bjorck,A.1974,Numerica/Methods(Englewood Cliffs,NJ:Prentice-Hall). 85.5.6,p.183. Forsythe,G.E.,and Moler,C.B.1967,Computer Solution of Linear Algebraic Systems(Engle- wood Cliffs,NJ:Prentice-Hall),Chapter 13. Ralston,A.,and Rabinowitz,P.1978,A First Course in Numerical Analysis,2nd ed.(New York: McGraw-Hill),89.5,p.437. Pan,V.,and Reif,J.1985,in Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (New York:Association for Computing Machinery).[1]
58 Chapter 2. Solution of Linear Algebraic Equations Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). We can define the norm of a matrix as the largest amplification of length that it is able to induce on a vector, R ≡ max v=0 |R · v| |v| (2.5.12) If we let equation (2.5.7) act on some arbitrary right-hand side b, as one wants a matrix inverse to do, it is obvious that a sufficient condition for convergence is R < 1 (2.5.13) Pan and Reif [1] point out that a suitable initial guess for B0 is any sufficiently small constant times the matrix transpose of A, that is, B0 = AT or R = 1 − AT · A (2.5.14) To see why this is so involves concepts from Chapter 11; we give here only the briefest sketch: AT · A is a symmetric, positive definite matrix, so it has real, positive eigenvalues. In its diagonal representation, R takes the form R = diag(1 − λ1, 1 − λ2,..., 1 − λN ) (2.5.15) where all the λi’s are positive. Evidently any satisfying 0 < < 2/(maxi λi) will give R < 1. It is not difficult to show that the optimal choice for , giving the most rapid convergence for equation (2.5.11), is = 2/(maxi λi + mini λi) (2.5.16) Rarely does one know the eigenvalues of AT · A in equation (2.5.16). Pan and Reif derive several interesting bounds, which are computable directly from A. The following choices guarantee the convergence of Bn as n → ∞, ≤ 1 j,k a2 jk or ≤ 1 maxi j |aij | × maxj i |aij | (2.5.17) The latter expression is truly a remarkable formula, which Pan and Reif derive by noting that the vector norm in equation (2.5.12) need not be the usual L2 norm, but can instead be either the L∞ (max) norm, or the L1 (absolute value) norm. See their work for details. Another approach, with which we have had some success, is to estimate the largest eigenvalue statistically, by calculating si ≡ |A · vi| 2 for several unit vector vi’s with randomly chosen directions in N-space. The largest eigenvalue λ can then be bounded by the maximum of 2 max si and 2NVar(si)/µ(si), where Var and µ denote the sample variance and mean, respectively. CITED REFERENCES AND FURTHER READING: Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis, 2nd ed. (Reading, MA: AddisonWesley), §2.3.4, p. 55. Golub, G.H., and Van Loan, C.F. 1989, Matrix Computations, 2nd ed. (Baltimore: Johns Hopkins University Press), p. 74. Dahlquist, G., and Bjorck, A. 1974, Numerical Methods (Englewood Cliffs, NJ: Prentice-Hall), §5.5.6, p. 183. Forsythe, G.E., and Moler, C.B. 1967, Computer Solution of Linear Algebraic Systems (Englewood Cliffs, NJ: Prentice-Hall), Chapter 13. Ralston, A., and Rabinowitz, P. 1978, A First Course in Numerical Analysis, 2nd ed. (New York: McGraw-Hill), §9.5, p. 437. Pan, V., and Reif, J. 1985, in Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (New York: Association for Computing Machinery). [1]
2.6 Singular Value Decomposition 59 2.6 Singular Value Decomposition There exists a very powerful set of techniques for dealing with sets ofequations or matrices that are either singular or else numerically very close to singular.In many cases where Gaussian elimination and LU decomposition fail to give satisfactory results,this set of techniques,known as singular value decomposition,or SVD, will diagnose for you precisely what the problem is.In some cases,SVD will not only diagnose the problem,it will also solve it,in the sense of giving you a useful numerical answer,although,as we shall see,not necessarily "the"answer that you thought you should get. SVD is also the method of choice for solving most linear least-squares problems. We will outline the relevant theory in this section,but defer detailed discussion of the use of SVD in this application to Chapter 15,whose subject is the parametric modeling of data. SVD methods are based on the following theorem oflinear algebra,whose proof is beyond our scope:Any M x N matrix A whose number of rows M is greater than or equal to its number of columns N,can be written as the product of an M x N column-orthogonal matrix U,an N x N diagonal matrix W with positive or zero (Nort serve elements (the singular values),and the transpose of an N x N orthogonal matrix V. America computer, make one paper University Press. THE The various shapes of these matrices will be made clearer by the following tableau: ART Programs send ! email to Copyright (C) ectcustser OF SCIENTIFIC COMPUTING (ISBN 1988-199200 v@cam 10-:6211 (2.6.1) ridge.org Numerical Recipes (outside The matrices U and V are each orthogonal in the sense that their columns are North Software. orthonormal. M 1≤k≤N ∑ttUm=6t 1≤n≤W (2.6.2) N 1≤k≤N 人VVn=6n 1≤n≤N (2.6.3)
2.6 Singular Value Decomposition 59 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 2.6 Singular Value Decomposition There exists a very powerful set of techniques for dealing with sets of equations or matrices that are either singular or else numerically very close to singular. In many cases where Gaussian elimination and LU decomposition fail to give satisfactory results, this set of techniques, known as singular value decomposition, or SVD, will diagnose for you precisely what the problem is. In some cases, SVD will not only diagnose the problem, it will also solve it, in the sense of giving you a useful numerical answer, although, as we shall see, not necessarily “the” answer that you thought you should get. SVD is also the method of choice for solving most linear least-squares problems. We will outline the relevant theory in this section, but defer detailed discussion of the use of SVD in this application to Chapter 15, whose subject is the parametric modeling of data. SVD methods are based on the following theorem of linear algebra, whose proof is beyond our scope: Any M × N matrix A whose number of rows M is greater than or equal to its number of columns N, can be written as the product of an M × N column-orthogonal matrix U, an N × N diagonal matrix W with positive or zero elements (the singular values), and the transpose of an N × N orthogonal matrix V. The various shapes of these matrices will be made clearer by the following tableau: A = U · w1 w2 ··· ··· wN · VT (2.6.1) The matrices U and V are each orthogonal in the sense that their columns are orthonormal, M i=1 UikUin = δkn 1 ≤ k ≤ N 1 ≤ n ≤ N (2.6.2) N j=1 VjkVjn = δkn 1 ≤ k ≤ N 1 ≤ n ≤ N (2.6.3)