gtn THE DECOMPOSITIONAL APPROACH TO MATRIX COMPUTATION The introduction of matrix decomposition into numerical linear algebra revolutionized matrix computations.This article outlines the decompositional approach,comments on its history,and surveys the six most widely used decompositions. n 1951,Paul S.Dwyer published Linear used scalar equations whereas Householder used lecomnosition-the factorization of a matris cal calculators.Nonetheless,the book was state into the product of lower and upper triangular matrices 10544 ples of Nmerical Analysis,one of the first mod- proach is that t isnot the ess of the matrix algorithmists tosolve par is strik- ing.The most obvious difference is that Dwyer swing by the mid-1960s,has revolutionized ma- itiona 1521-9%15001000200E uences.I begin with a G.W.STEWART ofthe Choeyh roach and its cons University of Maryland 50 COMPUTING IN SCIENCE ENGINEERING
I n 1951, Paul S. Dwyer published Linear Computations, perhaps the first book devoted entirely to numerical linear algebra.1 Digital computing was in its infancy, and Dwyer focused on computation with mechanical calculators. Nonetheless, the book was state of the art. Figure 1 reproduces a page of the book dealing with Gaussian elimination. In 1954, Alston S. Householder published Principles of Numerical Analysis, 2 one of the first modern treatments of high-speed digital computation. Figure 2 reproduces a page from this book, also dealing with Gaussian elimination. The contrast between these two pages is striking. The most obvious difference is that Dwyer used scalar equations whereas Householder used partitioned matrices. But a deeper difference is that while Dwyer started from a system of equations, Householder worked with a (block) LU decomposition—the factorization of a matrix into the product of lower and upper triangular matrices. Generally speaking, a decomposition is a factorization of a matrix into simpler factors. The underlying principle of the decompositional approach to matrix computation is that it is not the business of the matrix algorithmists to solve particular problems but to construct computational platforms from which a variety of problems can be solved. This approach, which was in full swing by the mid-1960s, has revolutionized matrix computation. To illustrate the nature of the decompositional approach and its consequences, I begin with a discussion of the Cholesky decomposition and the solution of linear systems—essentially the decomposition that Gauss computed in his elim- 50 COMPUTING IN SCIENCE & ENGINEERING THE DECOMPOSITIONAL APPROACH TO MATRIX COMPUTATION The introduction of matrix decomposition into numerical linear algebra revolutionized matrix computations. This article outlines the decompositional approach, comments on its history, and surveys the six most widely used decompositions. T HEME A RTICLE G.W. STEWART University of Maryland 1521-9615/00/$10.00 © 2000 IEEE the Top
.6 rtdTephvo dingin +++= ) an4-(9-(e)} 阿4=- 器波兰 b可the rystem 2-2w ()-8 端 a10 )0-()- 兰滋出 Figure 1.This page from Linear Computations Figure 2.On this page from Principles of Numerical ey Acan be factored in the form A=RTR where Risa ria Adisclaimer is in order.This article deals pri- A marily with dense matrix computations.Al ation 2 can be used matrices,the ways in which it has affectedthem hen x is the solution of the triangular system are different from what I describe here. However,by definitionyisthesotionof The Cholesky decomposition and used the pr thee rian nsequently,we have re linear systems gorithm: ochto 3) Ar=b (1) Because triangular systems are easy to solve,the where Ais positive definite.It is well known that introduction of the Cholesky decomposition has ANUARY/FEBRUARY 2000 51
JANUARY/FEBRUARY 2000 51 ination algorithm. This article also provides a tour of the five other major matrix decompositions, including the pivoted LU decomposition, the QR decomposition, the spectral decomposition, the Schur decomposition, and the singular value decomposition. A disclaimer is in order. This article deals primarily with dense matrix computations. Although the decompositional approach has greatly influenced iterative and direct methods for sparse matrices, the ways in which it has affected them are different from what I describe here. The Cholesky decomposition and linear systems We can use the decompositional approach to solve the system Ax = b (1) where A is positive definite. It is well known that A can be factored in the form A = RTR (2) where R is an upper triangular matrix. The factorization is called the Cholesky decomposition of A. The factorization in Equation 2 can be used to solve linear systems as follows. If we write the system in the form RTRx = b and set y = R−Tb, then x is the solution of the triangular system Rx = y. However, by definition y is the solution of the system RTy = b. Consequently, we have reduced the problem to the solution of two triangular systems, as illustrated in the following algorithm: 1. Solve the system RTy = b. 2. Solve the system Rx = y. (3) Because triangular systems are easy to solve, the introduction of the Cholesky decomposition has Figure 1. This page from Linear Computations shows that Paul Dwyer’s approach begins with a system of scalar equations. Courtesy of John Wiley & Sons. Figure 2. On this page from Principles of Numerical Analysis, Alston Householder uses partitioned matrices and LU decomposition. Courtesy of McGraw-Hill
used to solve a system at one point in a compu tation,you can reuse the decomposition to do pute it solved the system in Equation I by reducing it elimination when a new right-hand side presents itself.A naive programmer is in danger of per from the tion is in the background can reuse it as needed (By the way,the problem of rec *s call to a single rou such as Matlab and Mathematica have the same problem- hese varleties of Gaussian eliminatior are an numer cally ch are proach to the consumers of matrixa rithms Another advantage of working with decompo d in n in more than one problem.For example,the fol- and Cholesky's algorithm in particular.Figure3 lowing algorithm solves the system A'x=: area contains partially processed element the boundary strips contain elements about to rocessed.Most of these variants w lly pres nted in sc (=(R)(RT)(5)volved.it is easy to see the essential unity of the we can compute pas follows: 1.Solve the system Ry=x all For ex c.the 2.0=vTy. (0 gorithm,in whatever guise,is backward stable the computed factor Rsatisfies ThedeompeoioaleramthCaolkga composition requires)op (A+E)=RTR rations to com (7 pute,whereas the solution of triangular systems where E is of the size of the rounding unit rela COMPUTING IN SCIENCE ENGINEERINC
52 COMPUTING IN SCIENCE & ENGINEERING transformed our problem into one for which the solution can be readily computed. We can use such decompositions to solve more than one problem. For example, the following algorithm solves the system ATx = b: 1. Solve the system Ry = b. 2. Solve the system RTx = y. (4) Again, in many statistical applications we want to compute the quantity ρ = xTA−1 x. Because xTA−1 x = xT(RTR) −1 x = (R−Tx) T(R−Tx) (5) we can compute ρ as follows: 1. Solve the system RTy = x. 2. ρ = yTy. (6) The decompositional approach can also save computation. For example, the Cholesky decomposition requires O(n3 ) operations to compute, whereas the solution of triangular systems requires only O(n2 ) operations. Thus, if you recognize that a Cholesky decomposition is being used to solve a system at one point in a computation, you can reuse the decomposition to do the same thing later without having to recompute it. Historically, Gaussian elimination and its variants (including Cholesky’s algorithm) have solved the system in Equation 1 by reducing it to an equivalent triangular system. This mixes the computation of the decomposition with the solution of the first triangular system in Equation 3, and it is not obvious how to reuse the elimination when a new right-hand side presents itself. A naive programmer is in danger of performing the reduction from the beginning, thus repeating the lion’s share of the work. On the other hand, a program that knows a decomposition is in the background can reuse it as needed. (By the way, the problem of recomputing decompositions has not gone away. Some matrix packages hide the fact that they repeatedly compute a decomposition by providing drivers to solve linear systems with a call to a single routine. If the program calls the routine again with the same matrix, it recomputes the decomposition—unnecessarily. Interpretive matrix systems such as Matlab and Mathematica have the same problemthey hide decompositions behind operators and function calls. Such are the consequences of not stressing the decompositional approach to the consumers of matrix algorithms.) Another advantage of working with decompositions is unity. There are different ways of organizing the operations involved in solving linear systems by Gaussian elimination in general and Cholesky’s algorithm in particular. Figure 3 illustrates some of these arrangements: a white area contains elements from the original matrix, a dark area contains the factors, a light gray area contains partially processed elements, and the boundary strips contain elements about to be processed. Most of these variants were originally presented in scalar form as new algorithms. Once you recognize that a decomposition is involved, it is easy to see the essential unity of the various algorithms. All the variants in Figure 3 are numerically equivalent. This means that one rounding-error analysis serves all. For example, the Cholesky algorithm, in whatever guise, is backward stable: the computed factor R satisfies (A + E) = RTR (7) where E is of the size of the rounding unit relative to A. Establishing this backward is usually the most difficult part of an analysis of the use Figure 3. These varieties of Gaussian elimination are all numerically equivalent
of a decor rounding errors involved in the solutions of the the decompoonal elative ease.I hus,ano but many problems e the of original matrix. In general, The decompositional approach often shows that apparently Cholest dec compute the new decomposition directly from Many matrix decompositions can be updated,sometimes with great savings in computatio s instead of a host o position of from that of a in ou op erations,an enormous savings over the initio produce highly effective matrix packages the reduction of a quadratic form x)=xA iety of specific applica con nac m most important applications.Thisappr oach has )=++g+ =p(x)+p防(x)+K+p(x) 1-6A wheren is the ith row of R.Thus Gauss re- found that most algorithms have broad compu- duced )toasum of squares of linear functions nal features in common s than car P Becaus Ris upper triangular,the function are the elements of RGauss,by showing how decomposionalaproaCh elacobi in introduced in othe roduced the LU decomposition as a decomposition of a bi- linear form into a sum of products of linear fund duced the decomp osition that now hears his osition made its an earance as an ort name.However,with the exception of the Schur nal change of variables that diagonalizeda bilin decomposition,they were the lar form.Eventually,all these decomposition some historical background for the individual so important to matrix computations was slow nd incremental.Gauss certainly had the spirit d in the He used used it to updatee uares solutions but tems defined by the normal equations for least Gauss never regarded his decomposition as a squares,described his elimination procedure as matrix factorization,and it would be anachro- ANUARY/FEBRUARY 2000
JANUARY/FEBRUARY 2000 53 of a decomposition to solve a problem. For example, once Equation 7 has been established, the rounding errors involved in the solutions of the triangular systems in Equation 3 can be incorporated in E with relative ease. Thus, another advantage of the decompositional approach is that it concentrates the most difficult aspects of rounding-error analysis in one place. In general, if you change the elements of a positive definite matrix, you must recompute its Cholesky decomposition from scratch. However, if the change is structured, it may be possible to compute the new decomposition directly from the old—a process known as updating. For example, you can compute the Cholesky decomposition of A + xxT from that of A in O(n2 ) operations, an enormous savings over the ab initio computation of the decomposition. Finally, the decompositional approach has greatly affected the development of software for matrix computation. Instead of wasting energy developing code for a variety of specific applications, the producers of a matrix package can concentrate on the decompositions themselves, perhaps with a few auxiliary routines to handle the most important applications. This approach has informed the major public-domain packages: the Handbook series,3 Eispack, 4 Linpack, 5 and Lapack.6 A consequence of this emphasis on decompositions is that software developers have found that most algorithms have broad computational features in common—features than can be relegated to basic linear-algebra subprograms (such as Blas), which can then be optimized for specific machines.7−9 For easy reference, the sidebar “Benefits of the decompositional approach” summarizes the advantages of decomposition. History All the widely used decompositions had made their appearance by 1909, when Schur introduced the decomposition that now bears his name. However, with the exception of the Schur decomposition, they were not cast in the language of matrices (in spite of the fact that matrices had been introduced in 185810). I provide some historical background for the individual decompositions later, but it is instructive here to consider how the originators proceeded in the absence of matrices. Gauss, who worked with positive definite systems defined by the normal equations for least squares, described his elimination procedure as the reduction of a quadratic form ϕ(x) = xTAx (I am simplifying a little here). In terms of the Cholesky factorization A = RTR, Gauss wrote ϕ(x) in the form (8) where is the ith row of R. Thus Gauss reduced ϕ(x) to a sum of squares of linear functions ρi. Because R is upper triangular, the function ρi(x) depends only on the components xi,…xn of x. Since the coefficients in the linear forms ρi are the elements of R, Gauss, by showing how to compute the ρi, effectively computed the Cholesky decomposition of A. Other decompositions were introduced in other ways. For example, Jacobi introduced the LU decomposition as a decomposition of a bilinear form into a sum of products of linear functions having an appropriate triangularity with respect to the variables. The singular value decomposition made its appearance as an orthogonal change of variables that diagonalized a bilinear form. Eventually, all these decompositions found expressions as factorizations of matrices.11 The process by which decomposition became so important to matrix computations was slow and incremental. Gauss certainly had the spirit. He used his decomposition to perform many tasks, such as computing variances, and even used it to update least-squares solutions. But Gauss never regarded his decomposition as a matrix factorization, and it would be anachrori T ϕ ρ ρ ρ ( ) ( ) ( ) ( ) x r x r x r x x x x n n = ( ) + ( ) + + ( ) ≡ + + + 1 2 2 2 2 1 2 2 2 2 T T T K K Benefits of the decompositional approach • A matrix decomposition solves not one but many problems. • A matrix decomposition, which is generally expensive to compute, can be reused to solve new problems involving the original matrix. • The decompositional approach often shows that apparently different algorithms are actually computing the same object. • The decompositional approach facilitates rounding-error analysis. • Many matrix decompositions can be updated, sometimes with great savings in computation. • By focusing on a few decompositions instead of a host of specific problems, software developers have been able to produce highly effective matrix packages
nistic to consider him the father of the decom- algorithms that compute them have a satisfac tory backward rounding-error analysis (see ms for s ences H.H.Goldstine.in their ground-breaking error merical linear algebra, analysis of the solution oflinear systems,pointed ries,or the LINPACK Users'Guide. ng the a positive definite matrix We may therefore interpret the elimination er triangular matrix R ses the inverting of an x A on the of tw A=RTR In this form,the decomposition is known as th verses by a simple,e ctive proces he6rrdcomposiontsoienwmlians In the 1950s and early 1960s,Householder A=LDLT y explored the relation bet ms in m terms. 6 th mathe mpositi al) Applications The Cholesky decomposition is red h to reduce a sym orm by or ms, qu ons and 6.1 a way station to the computation of the eia tics as in Equation 4 values of A,and at the time no one thought of it Algorithms.A Cholesky decomposition can riants of Gauss terme to take aduanta y All the . comp ach algorithms take ap ima evn/6f oating e the first back nt add ons and multiplications.The alg ward rounding the so y prop respon s to the complete.He gives s one analysis of the comp Updating.Given a Cholesk tation of the LU decomposition and another of A=RTR,you can calculate the Cholesky d rom R a analyzing nd x in O(n Cholmg-po nt a tions.introducing uniform techniques for deal culated with the same number of operations. ing with the transformations used in the com The latter process,which is called downdating,is time hi boo updating stablished nd Aip nite,then PTAPis said to be a diagonal permu tation of A (an The big six ong other thing,it permutes the agonals o Any diagona old Such a factorization is called a pivoted daily.Nonetheless,six decompositions hold the factorization.There are many ways to pivot a e useful and -they have important applications and one pro COMPUTING IN SCIENCE ENGINEERING
54 COMPUTING IN SCIENCE & ENGINEERING nistic to consider him the father of the decompositional approach. In the 1940s, awareness grew that the usual algorithms for solving linear systems involved matrix factorization.12,13 John Von Neumann and H.H. Goldstine, in their ground-breaking error analysis of the solution of linear systems, pointed out the division of labor between computing a factorization and solving the system:14 We may therefore interpret the elimination method as one which bases the inverting of an arbitrary matrix A on the combination of two tricks: First it decomposes A into the product of two semi-diagonal matrices C, B′ …, and consequently the inverse of A obtains immediately from those of C and B′. Second it forms their inverses by a simple, explicit, inductive process. In the 1950s and early 1960s, Householder systematically explored the relation between various algorithms in matrix terms. His book The Theory of Matrices in Numerical Analysis is the mathematical epitome of the decompositional approach.15 In 1954, Givens showed how to reduce a symmetric matrix A to tridiagonal form by orthogonal transformation.16 The reduction was merely a way station to the computation of the eigenvalues of A, and at the time no one thought of it as a decomposition. However, it and other intermediate forms have proven useful in their own right and have become a staple of the decompositional approach. In 1961, James Wilkinson gave the first backward rounding-error analysis of the solutions of linear systems.17 Here, the division of labor is complete. He gives one analysis of the computation of the LU decomposition and another of the solution of triangular systems and then combines the two. Wilkinson continued analyzing various algorithms for computing decompositions, introducing uniform techniques for dealing with the transformations used in the computations. By the time his book Algebraic Eigenvalue Problem18 appeared in 1965, the decompositional approach was firmly established. The big six There are many matrix decompositions, old and new, and the list of the latter seems to grow daily. Nonetheless, six decompositions hold the center. The reason is that they are useful and stable—they have important applications and the algorithms that compute them have a satisfactory backward rounding-error analysis (see Equation 7). In this brief tour, I provide references only for details that cannot be found in the many excellent texts and monographs on numerical linear algebra,18−26 the Handbook series,3 or the LINPACK Users’ Guide. 5 The Cholesky decomposition Description. Given a positive definite matrix A, there is a unique upper triangular matrix R with positive diagonal elements such that A = RTR. In this form, the decomposition is known as the Cholesky decomposition. It is often written in the form A = LDLT where D is diagonal and L is unit lower triangular (that is, L is lower triangular with ones on the diagonal). Applications. The Cholesky decomposition is used primarily to solve positive definite linear systems, as in Equations 3 and 6. It can also be employed to compute quantities useful in statistics, as in Equation 4. Algorithms. A Cholesky decomposition can be computed using any of the variants of Gaussian elimination (see Figure 3)modified, of course, to take advantage of symmetry. All these algorithms take approximately n3 /6 floatingpoint additions and multiplications. The algorithm Cholesky proposed corresponds to the diagram in the lower right of Figure 3. Updating. Given a Cholesky decomposition A = RTR, you can calculate the Cholesky decomposition of A + xxT from R and x in O(n2 ) floating-point additions and multiplications. The Cholesky decomposition of A − xxT can be calculated with the same number of operations. The latter process, which is called downdating, is numerically less stable than updating. The pivoted Cholesky decomposition. If P is a permutation matrix and A is positive definite, then PTAP is said to be a diagonal permutation of A (among other things, it permutes the diagonals of A). Any diagonal permutation of A is positive definite and has a Cholesky factor. Such a factorization is called a pivoted Cholesky factorization. There are many ways to pivot a Cholesky decomposition, but the most common one produces a factor satisfying
the of the (9)certain conditions a bilinear form y)can be written in the form where p;and o,are linear functions that depend only on th st( of I and I decomposition is widely used for rank determi- nation The QR decomposition Let an orthogo which he sketched in 18027 and presented in full in 1810.2 QA=8) ant po where ri ith ive di- The pivoted LU decomposition there are Description.Given ifAis of rankn). 1 Osuch that If we partition Qin the form PTAQ-LU 0=(0a0 tri ngu rand Uis upper where O has a columns,then we can write are e proce A-OR (10 lications.Like the Cholesky decomposi- This is sometimes called the QR factorization of n is use prima wever, ications.When A is of rank the s of o.f range For the LU decomposition is the column sp ce R(A)of A,and the columns of form an orthonormal basis of the ortho with the inverse power method,to complement of R(A).In parti Algorithms.The basic algorithm for the or d videly used in applications with a geometric favor,especially s.There are tw the ent for s cial matrices (such as nositive def on:Gram-Schmidt algorithms inite and diagonally dominant matrices),the triangularization. requires some forr for sta Gram-Schmidt algorithms proceec tepwis in which p get the th column to be eliminated.This algorithm re column of There are two forms of the quires about n/3 additions chmidt algorithm,the classical and the tain contrive ow that Gau an they bo stable.Nonetheless.it works well for the o Gram-Schmidt is unstable.The modified form 03 an produce a matrix Qa whose columns deviate from orthogonality.But the deviation is JANUARY/FEBRUARY 2000
JANUARY/FEBRUARY 2000 55 (9) In particular, if A is positive semidefinite, this strategy will assure that R has the form where R11 is nonsingular and has the same order as the rank of A. Hence, the pivoted Cholesky decomposition is widely used for rank determination. History. The Cholesky decomposition (more precisely, an LDLT decomposition) was the decomposition of Gauss’s elimination algorithm, which he sketched in 180927 and presented in full in 1810.28 Benoît published Cholesky’s variant posthumously in 1924.29 The pivoted LU decomposition Description. Given a matrix A of order n, there are permutations P and Q such that PTAQ = LU where L is unit lower triangular and U is upper triangular. The matrices P and Q are not unique, and the process of selecting them is known as pivoting. Applications. Like the Cholesky decomposition, the LU decomposition is used primarily for solving linear systems. However, since A is a general matrix, this application covers a wide range. For example, the LU decomposition is used to compute the steady-state vector of Markov chains and, with the inverse power method, to compute eigenvectors. Algorithms. The basic algorithm for computing LU decompositions is a generalization of Gaussian elimination to nonsymmetric matrices. When and how this generalization arose is obscure (see Dwyer1 for comments and references). Except for special matrices (such as positive definite and diagonally dominant matrices), the method requires some form of pivoting for stability. The most common form is partial pivoting, in which pivot elements are chosen from the column to be eliminated. This algorithm requires about n3 /3 additions and multiplications. Certain contrived examples show that Gaussian elimination with partial pivoting can be unstable. Nonetheless, it works well for the overwhelming majority of real-life problems.30,31 Why is an open question. History. In establishing the existence of the LU decomposition, Jacobi showed32 that under certain conditions a bilinear form ϕ(x, y) can be written in the form ϕ(x, y) = ρ1(x)σ1(y) + ρ2(x)σ2(y) + … + ρn(x)σn(y) where ρi and σi are linear functions that depend only on the last (n – i + 1) components of their arguments. The coefficients of the functions are the elements of L and U. The QR decomposition Description. Let A be an m × n matrix with m ≥ n. There is an orthogonal matrix Q such that where R is upper triangular with nonnegative diagonal elements (or positive diagonal elements if A is of rank n). If we partition Q in the form Q = (QA Q⊥) where QA has n columns, then we can write A = QAR. (10) This is sometimes called the QR factorization of A. Applications. When A is of rank n, the columns of QA form an orthonormal basis for the column space R(A) of A, and the columns of Q⊥ form an orthonormal basis of the orthogonal complement of R(A). In particular, is the orthogonal projection onto R(A). For this reason, the QR decomposition is widely used in applications with a geometric flavor, especially least squares. Algorithms. There are two distinct classes of algorithms for computing the QR decomposition: Gram–Schmidt algorithms and orthogonal triangularization. Gram–Schmidt algorithms proceed stepwise by orthogonalizing the kth columns of A against the first (k − 1) columns of Q to get the kth column of Q. There are two forms of the Gram–Schmidt algorithm, the classical and the modified, and they both compute only the factorization in Equation 10. The classical Gram–Schmidt is unstable. The modified form can produce a matrix QA whose columns deviate from orthogonality. But the deviation is QAQA T Q AT R = 0 R = R11 R12 0 0 r r kk ij j k i k 2 2 ≥ > ≥ ∑∑
bounded,and the computed factorization can be oted QR factorization has the form used in certain applications- -notably comput tions.If the orthogonaliza t each stag AP- a proces will produce a fully orthogonal factorization. It follows that either or the first columns of additions and mul AP form a basis for the column space of A hus he pivoted QR compos nal triangularizatio colun ans from A. g A by certain simple History.The QR factorization first appeared rthogonal matrices until the elements below in a work by Erhard Schmidt on integral equa the diagonal are zer I he ons ,Schmidt showed how owknown as the Gram-Schmidt algorithm ons.The first reduces the matrix by Hous formations.I he method has the of orthogo ality.)Th th age titrep the Q hold A.a great savings algorithm,discussed later).Houscholder intr method r uces A by plane rotations.It is less duced Householder transf mations to m method omp ons an showed h they c Relation to the Cholesky decomposition. them to reduce a sy metric matrix to tridiage From Equation 10,it follows that nal for Bogert and Burris appear to be the first AA-RR due to Golub.7who also introduced the idea of In other words,the triangular factor of the QR pivoting. composition of A the tri angular factor of The netric matrix of particularly least-square problems can be der There is an orthogonal matrix Vsuch soved using either the Cho that on.The OR de A=VA VT A diag( (12) rate results whereas the Cholesky decomposition is often If;denotes the ith column of V,then Av;= QR fa puting the QR factorization after rows and mal system of eigenvectors. Applications.The spectral decomposition In additio e Q of finds application here bly s to say in The pivoted QR decomposition.If Pis a Algorithms.There are three classes of algo rix,then APis A 15a agon the QR algor quer a CATA the OR and the Cholesky decomt ns it is require a preliminary reduction to tridia not surprising that there is a pivoted QR factor- form by orthogonal similarities.I discuss the QR actor Rsatishes Equa algorithm in th composition. conquer algo COMPUTING IN SCIENCE ENGINEERING
56 COMPUTING IN SCIENCE & ENGINEERING bounded, and the computed factorization can be used in certain applications—notably computing least-squares solutions. If the orthogonalization step is repeated at each stage—a process known as reorthogonalization—both algorithms will produce a fully orthogonal factorization. When n » m, the algorithms without reorthogonalization require about mn2 additions and multiplications. The method of orthogonal triangularization proceeds by premultiplying A by certain simple orthogonal matrices until the elements below the diagonal are zero. The product of the orthogonal matrices is Q, and R is the upper triangular part of the reduced A. Again, there are two versions. The first reduces the matrix by Householder transformations. The method has the advantage that it represents the entire matrix Q in the same amount of memory that is required to hold A, a great savings when n » p. The second method reduces A by plane rotations. It is less efficient than the first method, but is better suited for matrices with structured patterns of nonzero elements. Relation to the Cholesky decomposition. From Equation 10, it follows that ATA = RTR. (11) In other words, the triangular factor of the QR decomposition of A is the triangular factor of the Cholesky decomposition of the cross-product matrix ATA. Consequently, many problems— particularly least-squares problems—can be solved using either a QR decomposition from a least-squares matrix or the Cholesky decomposition from the normal equation. The QR decomposition usually gives more accurate results, whereas the Cholesky decomposition is often faster. Updating. Given a QR factorization of A, there are stable, efficient algorithms for recomputing the QR factorization after rows and columns have been added to or removed from A. In addition, the QR decomposition of the rank-one modification A + xyT can be stably updated. The pivoted QR decomposition. If P is a permutation matrix, then AP is a permutation of the columns of A, and (AP) T(AP) is a diagonal permutation of ATA. In view of the relation of the QR and the Cholesky decompositions, it is not surprising that there is a pivoted QR factorization whose triangular factor R satisfies Equation 9. In particular, if A has rank k, then its pivoted QR factorization has the form . It follows that either Q1 or the first k columns of AP form a basis for the column space of A. Thus, the pivoted QR decomposition can be used to extract a set of linearly independent columns from A. History. The QR factorization first appeared in a work by Erhard Schmidt on integral equations.33 Specifically, Schmidt showed how to orthogonalize a sequence of functions by what is now known as the Gram–Schmidt algorithm. (Curiously, Laplace produced the basic formulas34 but had no notion of orthogonality.) The name QR comes from the QR algorithm, named by Francis (see the history notes for the Schur algorithm, discussed later). Householder introduced Householder transformations to matrix computations and showed how they could be used to triangularize a general matrix.35 Plane rotations were introduced by Givens,16 who used them to reduce a symmetric matrix to tridiagonal form. Bogert and Burris appear to be the first to use them in orthogonal triangularization.36 The first updating algorithm (adding a row) is due to Golub,37 who also introduced the idea of pivoting. The spectral decomposition Description. Let A be a symmetric matrix of order n. There is an orthogonal matrix V such that A = V Λ VT, Λ = diag(λ1, …,λn). (12) If vi denotes the ith column of V, then Avi = λi vi . Thus (λi , vi ) is an eigenpair of A, and the spectral decomposition shown in Equation 12 exhibits the eigenvalues of A along with complete orthonormal system of eigenvectors. Applications. The spectral decomposition finds applications wherever the eigensystem of a symmetric matrix is needed, which is to say in virtually all technical disciplines. Algorithms. There are three classes of algorithms to compute the spectral decomposition: the QR algorithm, the divide-and-conquer algorithm, and Jacobi’s algorithm. The first two require a preliminary reduction to tridiagonal form by orthogonal similarities. I discuss the QR algorithm in the next section on the Schur decomposition. The divide-and-conquer algoAP = (Q Q ) R R 1 2 11 12 0 0
rithm38,39 is HebgomohheEom which is usually done with values and eigenvector rs are desired;however,it computed using the QRalgorithm.Elsewhere is not suitabl his issue, Beresford Parlett discusses m.It is one variants for the spectral decomt osition the sin accurate.40 All these algorithms require O(n) gular values decomposition,and the generalized oper problem his d in 1909 is the only o positions,the algorithm does not result in a re have been derived in terms of matrices.It was -it still remains largely ignored until Francis's QR algorithm nstant Is low pushe t into the limelight The singular value decomposition duced the eigenvectors as solutions of equation of the form Ar= e orthogonal b,8ing such tha rithm for spectral de position,which iter- type where al form ∑=diag(o, 022 .之6.20 Givens to Householder. The Schur iption.Let A be a matrix of order n. There is a unitary matrix Usuch that A=UaΣ (13) A=UTU which is mes called the singular value fac of A where Tis upper triangular and H meansc 0n- The diagonal elements of oare called thei jugate transpose.The eigen elements of Tare hich,by appropr This de omposition is called a S led by th tion of A. singular value decomposition.In addition,the real matrix eigenvalu singular valu hence 2 blocks tain its complex eigenvalues.the entire decom low-rank ap roimations and to compute nge position can be made real.This is sometimes between subspaces. Relation to the spectral de mposition.The A in cdiate for hich the tral ion in tuuch the eigenvalues and eigen tors of a matrix can be OR factorization is related to the Cholesky de computed.On th other hand,the Schur de in pla e or a com exist.A nle is th ATA=∑2I equation and its relatives. Algorithms After a preliminary reduction to Thus the eigenvalues of the cross-product ma 57
JANUARY/FEBRUARY 2000 57 rithm38,39 is comparatively recent and is usually faster than the QR algorithm when both eigenvalues and eigenvectors are desired; however, it is not suitable for problems in which the eigenvalues vary widely in magnitude. The Jacobi algorithm is much slower than the other two, but for positive definite matrices it may be more accurate.40 All these algorithms require O(n3 ) operations. Updating. The spectral decomposition can be updated. Unlike the Cholesky and QR decompositions, the algorithm does not result in a reduction in the order of the work—it still remains O(n3 ), although the order constant is lower. History. The spectral decomposition dates back to an 1829 paper by Cauchy, 41 who introduced the eigenvectors as solutions of equations of the form Ax = λx and proved the orthogonality of eigenvectors belonging to distinct eigenvalues. In 1846, Jacobi42 gave his famous algorithm for spectral decomposition, which iteratively reduces the matrix in question to diagonal form by a special type of plane rotations, now called Jacobi rotations. The reduction to tridiagonal form by plane rotations is due to Givens16 and by Householder transformations to Householder. 43 The Schur decomposition Description. Let A be a matrix of order n. There is a unitary matrix U such that A = UTUH where T is upper triangular and H means conjugate transpose. The diagonal elements of T are the eigenvalues of A, which, by appropriate choice of U, can be made to appear in any order. This decomposition is called a Schur decomposition of A. A real matrix can have complex eigenvalues and hence a complex Schur form. By allowing T to have real 2 × 2 blocks on its diagonal that contain its complex eigenvalues, the entire decomposition can be made real. This is sometimes called a real Schur form. Applications. An important use of the Schur form is as an intermediate form from which the eigenvalues and eigenvectors of a matrix can be computed. On the other hand, the Schur decomposition can often be used in place of a complete system of eigenpairs, which, in fact, may not exist. A good example is the solution of Sylvester’s equation and its relatives.44,45 Algorithms. After a preliminary reduction to Hessenberg form, which is usually done with Householder transformations, the Schur from is computed using the QR algorithm.46 Elsewhere in this issue, Beresford Parlett discusses the modern form of the algorithm. It is one of the most flexible algorithms in the repertoire, having variants for the spectral decomposition, the singular values decomposition, and the generalized eigenvalue problem. History. Schur introduced his decomposition in 1909.47 It was the only one of the big six to have been derived in terms of matrices. It was largely ignored until Francis’s QR algorithm pushed it into the limelight. The singular value decomposition Description. Let A be an m × n matrix with m ≥ n. There are orthogonal matrices U and V such that where Σ = diag(σ1, …, σn), σ1 ≥ σ2 ≥ … ≥ σn ≥ 0. This decomposition is called the singular value decomposition of A. If UA consists of the first n columns of U, we can write A = UA Σ VT (13) which is sometimes called the singular value factorization of A. The diagonal elements of σ are called the singular values of A. The corresponding columns of U and V are called left and right singular vectors of A. Applications. Most of the applications of the QR decomposition can also be handled by the singular value decomposition. In addition, the singular value decomposition gives a basis for the row space of A and is more reliable in determining rank. It can also be used to compute optimal low-rank approximations and to compute angles between subspaces. Relation to the spectral decomposition. The singular value factorization is related to the spectral decomposition in much the same way as the QR factorization is related to the Cholesky decomposition. Specifically, from Equation 13 it follows that ATA = VΣ2 VT. Thus the eigenvalues of the cross-product maU AV T = Σ 0
trix ATA are the squares of the singular vectors of A and the eigenvectors of ATA are right sin- 1 tion computing the singular value decomposition:the two 1.6 New Yark than the QR algorithm,and the Jacobi algorithm is the slowest 12.P.. compositi 738 and lordan in 18.The reduction to bidias 19 The first Jacobi-like more.Asm intermediate forms- -such as tridi agonal and Hessenberg forms pensive to compute and not readily updated. rank-revealingivshave here are also genera 21.Iw.7 n a the 2. All crystal balls become cloudy when ner troinsa」 ms a 24.60k197 tion to Matrix Computations,Academic 2五AW5bg,6gtAbaihmstatDcomposos,SAM Acknowledgment 26.D5Watsibgnaba%CampatasnWigy& 27.C.F.Gauss,Theoria Motus Corporum Coelestium in Sectionibu References ,Vol. 3" ni d on a (ous Vol..9.pp. 58 COMPUTING IN SCIENCE ENGINEERING
58 COMPUTING IN SCIENCE & ENGINEERING trix ATA are the squares of the singular vectors of A and the eigenvectors of ATA are right singular vectors of A. Algorithms. As with the spectral decomposition, there are three classes of algorithms for computing the singular value decomposition: the QR algorithm, a divide-and-conquer algorithm, and a Jacobi-like algorithm. The first two require a reduction of A to bidiagonal form. The divide-and-conquer algorithm49 is often faster than the QR algorithm, and the Jacobi algorithm is the slowest. History. The singular value decomposition was introduced independently by Beltrami in 187350 and Jordan in 1874.51 The reduction to bidiagonal form is due to Golub and Kahan,52 as is the variant of the QR algorithm. The first Jacobi-like algorithm for computing the singular value decomposition was given by Kogbetliantz.53 The big six are not the only decompositions in use; in fact, there are many more. As mentioned earlier, certain intermediate forms—such as tridiagonal and Hessenberg forms—have come to be regarded as decompositions in their own right. Since the singular value decomposition is expensive to compute and not readily updated, rank-revealing alternatives have received considerable attention.54,55 There are also generalizations of the singular value decomposition and the Schur decomposition for pairs of matrices.56,57 All crystal balls become cloudy when they look to the future, but it seems safe to say that as long as new matrix problems arise, new decompositions will be devised to solve them. Acknowledgment This work was supported by the National Science Foundation under Grant No. 970909-8562. References 1. P.S. Dwyer, Linear Computations, John Wiley & Sons, New York, 1951. 2. A.S. Householder, Principles of Numerical Analysis, McGraw-Hill, New York, 1953. 3. J.H. Wilkinson and C. Reinsch, Handbook for Automatic Computation, Vol. II, Linear Algebra, Springer-Verlag, New York, 1971. 4. B.S. Garbow et al., “Matrix Eigensystem RoutinesEispack Guide Extension,” Lecture Notes in Computer Science, Springer-Verlag, New York, 1977. 5. J.J. Dongarra et al., LINPACK User’s Guide, SIAM, Philadelphia, 1979. 6. E. Anderson et al., LAPACK Users’ Guide, second ed., SIAM, Philadelphia, 1995. 7. J.J. Dongarra et al., “An Extended Set of Fortran Basic Linear Algebra Subprograms,” ACM Trans. Mathematical Software, Vol. 14, 1988, pp. 1–17. 8. J.J. Dongarra et al., “A Set of Level 3 Basic Linear Algebra Subprograms,” ACM Trans. Mathematical Software, Vol. 16, 1990, pp. 1–17. 9. C.L. Lawson et al., “Basic Linear Algebra Subprograms for Fortran Usage,” ACM Trans. Mathematical Software, Vol. 5, 1979, pp. 308–323. 10. A. Cayley, “A Memoir on the Theory of Matrices,” Philosophical Trans. Royal Soc. of London, Vol. 148, 1858, pp. 17–37. 11. C.C. Mac Duffee, The Theory of Matrices, Chelsea, New York, 1946. 12. P.S. Dwyer, “A Matrix Presentation of Least Squares and Correlation Theory with Matrix Justification of Improved Methods of Solution,” Annals Math. Statistics, Vol. 15, 1944, pp. 82–89. 13. H. Jensen, An Attempt at a Systematic Classification of Some Methods for the Solution of Normal Equations, Report No. 18, Geodætisk Institut, Copenhagen, 1944. 14. J. von Neumann and H.H. Goldstine, “Numerical Inverting of Matrices of High Order,” Bull. Am. Math. Soc., Vol. 53, 1947, pp. 1021–1099. 15. A.S. Householder, The Theory of Matrices in Numerical Analysis, Dover, New York, 1964. 16. W. Givens, Numerical Computation of the Characteristic Values of a Real Matrix, Tech. Report 1574, Oak Ridge Nat’l Laboratory, Oak Ridge, Tenn., 1954. 17. J.H. Wilkinson, “Error Analysis of Direct Methods of Matrix Inversion,” J. ACM, Vol. 8, 1961, pp. 281–330. 18. J.H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, U.K., 1965. 19. Å. Björck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996. 20. B.N. Datta, Numerical Linear Algebra and Applications, Brooks/ Cole, Pacific Grove, Calif., 1995. 21. J.W. Demmel, Applied Numerical Linear Algebra. SIAM, Philadelphia, 1997. 22. N.J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, 1996. 23. B.N. Parlett, The Symmetric Eigenvalue Problem, Prentice-Hall, Englewood Cliffs, N.J., 1980; reissued with revisions by SIAM, Philadelphia, 1998. 24. G.W. Stewart, Introduction to Matrix Computations, Academic Press, New York, 1973. 25. G.W. Stewart, Matrix Algorithms I: Basic Decompositions, SIAM, Philadelphia, 1998. 26. D.S. Watkins, Fundamentals of Matrix Computations, John Wiley & Sons, New York, 1991. 27. C.F. Gauss, Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium [Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections], Perthes and Besser, Hamburg, Germany, 1809. 28. C.F. Gauss, “Disquisitio de elementis ellipticis Palladis [Disquisition on the Elliptical Elements of Pallas],” Commentatines societatis regiae scientarium Gottingensis recentiores, Vol. 1, 1810. 29. C. Benoît, “Note sur une méthode de résolution des équations normales provenant de l’application de la méthode des moindres cárres a un systém d’équations linéares en nombre inférieur ’a celui des inconnues. — application de la méthode á la resolution d’un systéme défini d’équations linéares (Procédé du Commandant Cholesky) [Note on a Method for Solving the Normal Equations Arising from the Application of the Method of Least Squares to a System of Linear Equations whose Number Is Less than the Number of Unknowns—Application of the Method to the Solution of a Definite System of Linear Equations],” Bullétin Géodésique (Toulouse), Vol. 2, 1924, pp. 5–77
nTethRpamtO-37,DeptocfCompwiarSdcne,Unk.afMi sota, 310 53.E.G.K ntegral ,Pp.1464-46 2 1973,p241-256. W151991266-126 G.W.Stewart is a professor in the Department of lle on d the u research interests ocus on numerical linear algebra 1829 perturbation theory,and rounding-error analysis.He nis PhD from the n die in der The ersity of Te of Man the N Scientific Programming 47 I.Schu eaaneneCereheoeernegalge will return in the next issue of CiSE. nslation by D.Boley is a IANUARY/FEBRUARY 2000
JANUARY/FEBRUARY 2000 59 30. L.N. Trefethen and R.S. Schreiber, “Average-Case Stability of Gaussian Elimination,” SIAM J. Matrix Analysis and Applications, Vol. 11, 1990, pp. 335–360. 31. L.N. Trefethen and D. Bau III, Numerical Linear Algebra, SIAM, Philadelphia, 1997, pp. 166–170. 32. C.G.J. Jacobi, “Über einen algebraischen Fundamentalsatz und seine Anwendungen [On a Fundamental Theorem in Algebra and Its Applications],” Journal für die reine und angewandte Mathematik, Vol. 53, 1857, pp. 275–280. 33. E. Schmidt, “Zur Theorie der linearen und nichtlinearen Integralgleichungen. I Teil, Entwicklung willkürlichen Funktionen nach System vorgeschriebener [On the Theory of Linear and Nonlinear Integral Equations. Part I, Expansion of Arbitrary Functions by a Prescribed System],” Mathematische Annalen, Vol. 63, 1907, pp. 433–476. 34. P.S. Laplace, Theorie Analytique des Probabilites [Analytical Theory of Probabilities], 3rd ed., Courcier, Paris, 1820. 35. A.S. Householder, “Unitary Triangularization of a Nonsymmetric Matrix,” J. ACM, Vol. 5, 1958, pp. 339–342. 36. D. Bogert and W.R. Burris, Comparison of Least Squares Algorithms, Tech. Report ORNL-3499, V.1, S5.5, Neutron Physics Div., Oak Ridge Nat’l Laboratory, Oak Ridge, Tenn., 1963. 37. G.H. Golub, “Numerical Methods for Solving Least Squares Problems,” Numerische Mathematik, Vol. 7, 1965, pp. 206–216. 38. J.J.M. Cuppen, “A Divide and Conquer Method for the Symmetric Eigenproblem,” Numerische Mathematik, Vol. 36, 1981, pp. 177–195. 39. M. Gu and S.C. Eisenstat, “Stable and Efficient Algorithm for the Rank-One Modification of the Symmetric Eigenproblem,” SIAM J. Matrix Analysis and Applications, Vol. 15, 1994, pp. 1266–1276. 40. J. Demmel and K. Veselic, “Jacobi’s Method Is More Accurate than QR,” SIAM J. Matrix Analysis and Applications, Vol. 13, 1992, pp. 1204–1245. 41. A.L. Cauchy, “Sur l’equation á l’aide de laquelle on détermine les inégalites séculaires des mouvements des planétes [On the Equation by which the Inequalities for the Secular Movement of Planets Is Determined],” Oeuvres Completes (II, e Séie), Vol. 9, 1829. 42. C.G.J. Jacobi, “Über ein leichtes Verfahren die in der Theorie der Säculärstörungen vorkommenden Gleichungen numerisch aufzulösen [On an Easy Method for the Numerical Solution of the Equations that Appear in the Theory of Secular Perturbations],” Journal für die reine und angewandte Mathematik, Vol. 30, 1846, pp. 51–s94. 43. J.H. Wilkinson, “Householder’s Method for the Solution of the Algebraic Eigenvalue Problem,” Computer J., Vol. 3, 1960, pp. 23–27. 44. R.H. Bartels and G.W. Stewart, “Algorithm 432: The Solution of the Matrix Equation AX − BX = C,” Comm. ACM, Vol. 8, 1972, pp. 820–826. 45. G.H. Golub, S. Nash, and C. Van Loan, “Hessenberg–Schur Method for the Problem AX + XB = C,” IEEE Trans. Automatic Control, Vol. AC-24, 1979, pp. 909–913. 46. J.G.F. Francis, “The QR Transformation, Parts I and II,” Computer J., Vol. 4, 1961, pp. 265–271, 1962, pp. 332–345. 47. J. Schur, “Über die charakteristischen Würzeln einer linearen Substitution mit einer Anwendung auf die Theorie der Integral gleichungen [On the Characteristic Roots of a Linear Transformation with an Application to the Theory of Integral Equations],” Mathematische Annalen, Vol. 66, 1909, pp. 448–510. 49. M. Gu and S.C. Eisenstat, “A Divide-and-Conquer Algorithm for the Bidiagonal SVD,” SIAM J. Matrix Analysis and Applications, Vol. 16, 1995, pp. 79–92. 50. E. Beltrami, “Sulle funzioni bilineari [On Bilinear Functions],” Giornale di Matematiche ad Uso degli Studenti Delle Universita, Vol. 11, 1873, pp. 98–106. An English translation by D. Boley is available in Tech. Report 90–37, Dept. of Computer Science, Univ. of Minnesota, Minneapolis, 1990. 51. C. Jordan, “Memoire sur les formes bilineaires [Memoir on Bilinear Forms],” Journal de Mathematiques Pures et Appliquees, Deuxieme Serie, Vol. 19, 1874, pp. 35–54. 52. G.H. Golub and W. Kahan, “Calculating the Singular Values and Pseudo-Inverse of a Matrix,” SIAM J. Numerical Analysis, Vol. 2, 1965, pp. 205–224. 53. E.G. Kogbetliantz, “Solution of Linear Systems by Diagonalization of Coefficients Matrix,” Quarterly of Applied Math., Vol. 13, 1955, pp. 123–132. 54. T.F. Chan, “Rank Revealing QR Factorizations,” Linear Algebra and Its Applications, Vol. 88/89, 1987, pp. 67–82. 55. G.W. Stewart, “An Updating Algorithm for Subspace Tracking,” IEEE Trans. Signal Processing, Vol. 40, 1992, pp. 1535–1541. 56. Z. Bai and J.W. Demmel, “Computing the Generalized Singular Value Decomposition,” SIAM J. Scientific and Statistical Computing, Vol. 14, 1993, pp. 1464–1468. 57. C.B. Moler and G.W. Stewart, “An Algorithm for Generalized Matrix Eigenvalue Problems,” SIAM J. Numerical Analysis, Vol. 10, 1973, pp. 241–256. G.W. Stewart is a professor in the Department of Computer Science and in the Institute for Advanced Computer Studies at the University of Maryland. His research interests focus on numerical linear algebra, perturbation theory, and rounding-error analysis. He earned his PhD from the University of Tennessee, where his advisor was A.S. Householder. Contact Stewart at the Dept. of Computer Science, Univ. of Maryland, College Park, MD 20742; stewart@cs.umd.edu. Scientific Programming will return in the next issue of CiSE