CLEVE'S CORNER Professor SVD BY CLEVE MOLER Stanford computer science professor Gene Golub has done more than anyone to make the singular value decomposition one of the most powerful and widely used tools in modern matrix computation from its SVD.Takeo,.,,0/ and=/4.Let APR CA L IF O R NI A 120951 -(m8) PROF SVD -(08) () The matrices Uand Vare roations through anglesandfollowed by reflections in the first dimension.The matrixis a diagonal scaling transformation.Generate A by computing The SVD is a recer lopment.Pete nothing to do with singular matrices A-UEVT rt,auth 3 paper Early Hi the You will find that me that the term the and 196 t used by d1910: d the SVD as a no 1.4015-1.0480 Picard used the way of characterizing A-(40910133) dition number of a matrix.We did not ye or ou the time.it had have a practical way to actually compute This says that the matrix A can be gener- it.Gene Golub and W.Kah- ated by a rotation through 45and a re- an published the first effec flection,followed by independent scalings in each of the two coordinate directi tive algorithm in 1965.A The singular value by factors of 2 and 1/2,respectively,fol- variant of that algorithm. published by Gene Golub lowed by a rotation through3and an- decomposition (SVD) other reflection nd Ch the one we use The MATLAB function generates is a matrix factorization By the time the firs a hgure that demonstrates the singular valu decomposition of a 2-by-2 matrix.Enter the statements with a wide range of We can a 2-bv-2 A·[1.4015-1.0480 interesting applications -0,40091,01331 wards,computing a matrix eigshow(A) Reprinted from TheMathWorks News&Notes Oclober 200wwmathworks.com
Reprinted from T heMathWorksNews&Notes | October 2006 | www.mathworks.com Cleve’s Corner By Cleve Moler from its SVD. Take σ1 = 2, σ2 = 1/2, θ = π/6 and φ = π/4. Let U = ( -cos θ sin θ sin θ cos θ) ∑ = ( σ1 0 0 σ2 ) V = ( -cos φ sin φ sin φ cos φ) The matrices U and V are rotations through angles θ and φ, followed by reflections in the first dimension. The matrix ∑ is a diagonal scaling transformation. Generate A by computing A = U∑V T You will find that A = ( 1.4015 -1.0480 - .4009 1.0133 ) This says that the matrix A can be generated by a rotation through 45° and a reflection, followed by independent scalings in each of the two coordinate directions by factors of 2 and 1/2, respectively, followed by a rotation through 30° and another reflection. The MATLAB function eigshow generates a figure that demonstrates the singular value decomposition of a 2-by-2 matrix. Enter the statements A = [1.4015 -1.0480; -0.4009 1.0133] eigshow(A) Professor SVD Stanford computer science professor Gene Golub has done more than anyone to make the singular value decomposition one of the most powerful and widely used tools in modern matrix computation. The SVD is a recent development. Pete Stewart, author of the 1993 paper “On the Early History of the Singular Value Decomposition”, tells me that the term valeurs singulières was first used by Emile Picard around 1910 in connection with integral equations. Picard used the adjective “singular” to mean something exceptional or out of the ordinary. At the time, it had nothing to do with singular matrices. When I was a graduate student in the early 1960s, the SVD was still regarded as a fairly obscure theoretical concept. A book that George Forsythe and I wrote in 1964 described the SVD as a nonconstructive way of characterizing the norm and condition number of a matrix. We did not yet have a practical way to actually compute it. Gene Golub and W. Kahan published the first effective algorithm in 1965. A variant of that algorithm, published by Gene Golub and Christian Reinsch in 1970 is still the one we use today. By the time the first MATLAB appeared, around 1980, the SVD was one of its highlights. We can generate a 2-by-2 example by working backwards, computing a matrix The singular value decomposition (SVD), is a matrix factorization with a wide range of interesting applications. Gene Golub’s license plate, photographed by Professor P. M. Kroonenberg of Leiden University
indene dent of scalings or dilatations in each coordinate direction 1.5 The rank of a matrix is the number of lin early independent rows.which is the same as the number of linearly independen columns.The rank of a diagonal matrix 0.5 s clearly the number of nonzero diagonal elements.Orthogonal type rank 15 tole eand hra make 050051152 elon form (rrer but the rreeis an un reliable tool for computation in the face of inexact data and arithmetic.The SyD can gre.SVDfigure prodced byghw be regarded as a modern,computationally powerful replacement for the RREF Click the SVD button and move the mouse Geometrically,transformations by orthogo- A square diagonal matrix is nonsingular if around.You will see Figure 1,but with dif. erent labels. eflections;they pre rve lengths SVD imple any square m灯 orthogonal i道iss pse is the re very er kind are no of V and the ki e roduct of three different orthoa nalbases for two-dime A=UVT sional space The columns of y are motated the system of linear quations 45 from the axes of the figure while the The matrix U is orthogonal and has as columns of U.which are the major and mi- many rows as A.The matrix Vis orthogo Ax=b nor axes of the ellipse,are rotated 30".The nal and has as many columns as A.The matrix A transforms v.into ou and v.into matrix 2 is the same size as A,but its only becomes nonzero elements are on the main diago on to m-by-matrices.One of nal.The diagonal elements of 2 are the si =b the most important features ofthe SVDist valles,and the mns of U and v mat re the lent and ngh atrio any lin Multiply by an orthogonal matrix.divide by sible to ch an orthonormal basis for the domain and Reprinted from The MathWorks News&Notes Oclober 2006 www.mathworks.com
Reprinted from T heMathWorksNews&Notes | October 2006 | www.mathworks.com Click the SVD button and move the mouse around. You will see Figure 1, but with different labels. The green circle is the unit circle in the plane. The blue ellipse is the image of this circle under transformation by the matrix A. The green vectors, v1 and v2 , which are the columns of V, and the blue vectors, u1 and u2 , which are the columns of U, are two different orthogonal bases for two-dimensional space. The columns of V are rotated 45° from the axes of the figure, while the columns of U2 , which are the major and minor axes of the ellipse, are rotated 30°. The matrix A transforms v1 into σ1 u1 and v2 into σ2 u2 . Let’s move on to m-by-n matrices. One of the most important features of the SVD is its use of orthgonal matrices. A real matrix U is orthogonal, or has orthonormal columns, if U TU = I This says that the columns of U are perpendicular to each other and have unit length. Geometrically, transformations by orthogonal matrices are generalizations of rotations and reflections; they preserve lengths and angles. Computationally, orthogonal matrices are very desirable because they do not magnify roundoff or other kinds of errors. Any real matrix A, even a nonsquare one, can be written as the product of three matrices. A = U∑V T The matrix U is orthogonal and has as many rows as A. The matrix V is orthogonal and has as many columns as A. The matrix ∑ is the same size as A, but its only nonzero elements are on the main diagonal. The diagonal elements of ∑ are the singular values, and the columns of U and V are the left and right singular vectors. In abstract linear algebra terms, a matrix represents a linear transformation from one vector space, the domain, to another, the range. The SVD says that for any linear transformation it is possible to choose an orthonormal basis for the domain and a possibly different orthonormal basis for the range. The transformation becomes independent of scalings or dilatations in each coordinate direction. The rank of a matrix is the number of linearly independent rows, which is the same as the number of linearly independent columns. The rank of a diagonal matrix is clearly the number of nonzero diagonal elements. Orthogonal transforms preserve linear independence. Thus, the rank of any matrix is the number of nonzero singular values. In MATLAB, enter the statement type rank to see how we choose a tolerance and count nonnegligible singular values. Traditional courses in linear algebra make considerable use of the reduced row echelon form (RREF), but the RREF is an unreliable tool for computation in the face of inexact data and arithmetic. The SVD can be regarded as a modern, computationally powerful replacement for the RREF. A square diagonal matrix is nonsingular if, and only if, its diagonal elements are nonzero. The SVD implies that any square matrix is nonsingular if, and only if, its singular values are nonzero. The most numerically reliable way to determine whether matrices are singular is to test their singular values. This is far better than trying to compute determinants, which have atrocious scaling properties. With the singular value decomposition, the system of linear equations Ax = b becomes U∑V Tx = b The solution is x = V∑ -1U Tb Multiply by an orthogonal matrix, divide by the singular values, then multiply by another orthogonal matrix. This is much more -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 v 1 v 2 σ1 u1 σ2 u2 Figure1. SVD figure produced by eigshow
12 50 120 Figure 2.Rank 12,50and 120 appaimaons598cr phoo of Gene Golub omputational work than gaussian elimina the scaled So far in this column i have hardly tion but it has impeceable numerical are the scores PCas are usuall ioned eie I wanted to show that it rties.You can iudge whether the sin osible to discuss alues with ueh to be ree ctors of the c atrix.AA".but of course.the egligible,and if they are,the rel- wo are o elated In fact if A is s evant singular system vmmetric and positive definite its sin Let E denote the outer product of the SVD and matrix approximation are often values and eigenyalues are equal and its left k-th left and right singular vectors,that is illustrated by approximating images.Our and right singular vectors are equal to each example starts with the photo on Gene other and to its eigenvectors.More gener- E =uv Golub's Web page(Figure 2).The image ally,the singular values of A are the square is 897-by-598 pixels.We stack the red oots of the eigenvalues of ATA or AA Then A can be expressed as a sum of rank-1 green,and blue JPEG components verti matrices, to produce a 2691-by-598 matrix rix is regarded a transformation from e then do jus SVD e to a c A- with pos ng to RGB 12h f- spa e inte are and dinar ally if at th ntial ion Google finds over 3.000,000 Web pages tion to the orieinal matrix.The error in the the orieinal imaee with rank 50.you that mention"singular value decomposi- roximation depends upon the magnitude can begin to read the mathematics on the of the neglected singular values Whenyou white board behind Gene.With rank 120 TMALAN do this with a matrix of data that has been the image is almost indistinguishable from of these pages before I started to write this centerd by subtracting the mean of each the full rank 598.(This is not a particularly column.I came across some other interest- column from the entire the pro- effective image compression technique.In ng ones as I surted around ess is known as pricpal comporent analy Professor SVD made all of this,and much sis (PCA).The right singular vectors,v,are more,possible.Thanks,Gene. Reprinted from The MathWarks News&Notes Oclober 2006 www.mathworks.com
Reprinted from T heMathWorksNews&Notes | October 2006 | www.mathworks.com computational work than Gaussian elimination, but it has impeccable numerical properties. You can judge whether the singular values are small enough to be regarded as negligible, and if they are, analyze the relevant singular system. Let Ek denote the outer product of the k-th left and right singular vectors, that is Ek = uk vk T Then A can be expressed as a sum of rank-1 matrices, n A = ∑σk Ek k=1 If you order the singular values in decreasing order, σ1 > σ2 > ... > σn , and truncate the sum after r terms, the result is a rank-r approximation to the original matrix. The error in the approximation depends upon the magnitude of the neglected singular values. When you do this with a matrix of data that has been centered, by subtracting the mean of each column from the entire column, the process is known as principal component analysis (PCA). The right singular vectors, vk , are Figure 2. Rank 12, 50, and 120 approximations to a rank 598 color photo of Gene Golub. the components, and the scaled left singular vectors, σk uk , are the scores. PCAs are usually described in terms of the eigenvalues and eigenvectors of the covariance matrix, AAT, but the SVD approach sometimes has better numerical properties. SVD and matrix approximation are often illustrated by approximating images. Our example starts with the photo on Gene Golub’s Web page (Figure 2). The image is 897-by-598 pixels. We stack the red, green, and blue JPEG components vertically to produce a 2691-by-598 matrix. We then do just one SVD computation. After computing a low-rank approximation, we repartition the matrix into RGB components. With just rank 12, the colors are accurately reproduced and Gene is recognizable, especially if you squint at the picture to allow your eyes to reconstruct the original image. With rank 50, you can begin to read the mathematics on the white board behind Gene. With rank 120, the image is almost indistinguishable from the full rank 598. (This is not a particularly effective image compression technique. In fact, my friends in image processing call it “image degradration.” ) So far in this column I have hardly mentioned eigenvalues. I wanted to show that it is possible to discuss singular values without discussing eigenvalues—but, of course, the two are closely related. In fact, if A is square, symmetric, and positive definite, its singular values and eigenvalues are equal, and its left and right singular vectors are equal to each other and to its eigenvectors. More generally, the singular values of A are the square roots of the eigenvalues of AT A or AAT. Singular values are relevant when the matrix is regarded as a transformation from one space to a different space with possibly different dimensions. Eigenvalues are relevant when the matrix is regarded as a transformation from one space into itself—as, for example, in linear ordinary differential equations. Google finds over 3,000,000 Web pages that mention “singular value decomposition” and almost 200,000 pages that mention “SVD MATLAB.” I knew about a few of these pages before I started to write this column. I came across some other interesting ones as I surfed around. Professor SVD made all of this, and much more, possible. Thanks, Gene. 7 12 50 120
A Few Search Results for "Singular Value Decomposition .The Wikipedia pages on SVD and PCA are quite good and .The first Google hit on"protein svd"is"Protein Substate Mod contain a number of useful links,although not to each other. eling and Identification Using the SVD.by Tod Romo at Rice en.wikipedia.org/wiki/Singular_value_decomposition University.T site provides an electronic exposition of th en.wikipedia.org/wiki/Principal component analysis of SVDin the analysis of the struct ure and motion of pro oth do Los alamos hiophysicists michael wall andreas rechsteiner and lake to identify ups public.lanl.gov/mewall/luwer2002.hml www.eigenvecor.com ."Representing cydic human motion using functional analysis (2005).by Dirk Ormoneit.Michad Black.Trevor Hastie.and Tammy Kolda and Brett bader.at sandia national labs in liver Hedvig Kjellstrom,describes techniques involving Fourier analy- more.ca.developed the Tensor Toolbox for MATLAB.which sisand principal component analysis for analyzing and modeling provides generalizations of PCA to multidimensional data sets. motio-cature data from activities such as walking. csmr.ca.sandia.gov/-gkola/TensorToobox www.csc.kth.se/-hedvig/publications/ivc_05.pd .A related paper is"De osing biological motion:fram s in the A demo nokr/ncm人akkerm e there are nine iustices each of wbom .A search at the US Patent and Trademark Office Web page lists and-1s Ifthe indoes had made their decisions hy fli ooins this 1,197 U.S.patents that mention "singular value decomposi ocrainy hvankBaut sirih found hat tionThe oldest,issued in 1987,is for"A fiber optic inspection the third singuar value isan order of manitude smaller than the system for use in the inspection of sandwiched solder bond first ne,so the matrix is well approimated by a matrix of k2 in integrated circuit packages Other titles include"Compres In other words,most of the court's decisions are close to being in a sion of surface light fields."Method of seismic surveying two-dimensional subspace of all possible decisions. www.pnas.orq/cqi/reprint/100/13/7432 www.uspio.gov/pain For ex See a 199 SIAM Re by Michael Ber RESOURCES tion Retrieval" ,0teay2 031on apubs.siam.org/SIREV/vbme-41/artL_34703.hml The MathWorks d SiBology,SiEvene 91425v0010/06
©1994-2006 by The MathWorks, Inc. MATLAB, Simulink, Stateflow, Handle Graphics, Real-Time Workshop, and xPC TargetBox are registered trademarks and SimBiology, SimEvents, and SimHydraulics are trademarks of The MathWorks, Inc. Other product or brand names are trademarks or registered trademarks of their respective holders. 91425v00 10/06 ■ The first Google hit on “protein svd” is “Protein Substate Modeling and Identification Using the SVD,” by Tod Romo at Rice University. The site provides an electronic exposition of the use of SVD in the analysis of the structure and motion of proteins, and includes some gorgeous graphics. bioc.rice.edu/~tromo/Sprez/toc.html ■ Los Alamos biophysicists Michael Wall, Andreas Rechsteiner, and Luis Rocha provide a good online reference about SVD and PCA, phrased in terms of applications to gene expression analysis. public.lanl.gov/mewall/kluwer2002.html ■ “Representing cyclic human motion using functional analysis” (2005), by Dirk Ormoneit, Michael Black, Trevor Hastie, and Hedvig Kjellstrom, describes techniques involving Fourier analysis and principal component analysis for analyzing and modeling motion-capture data from activities such as walking. www.csc.kth.se/~hedvig/publications/ivc_05.pdf ■ A related paper is “Decomposing biological motion: a framework for analysis and synthesis of human gait patterns,” (2002), by Nicholaus Troje. Troje’s work is the basis for an “eigenwalker” demo. www.journalofvision.org/2/5/2 www.mathworks.com/moler/ncm/walker.m ■ A search at the US Patent and Trademark Office Web page lists 1,197 U.S. patents that mention “singular value decomposition.” The oldest, issued in 1987, is for “A fiber optic inspection system for use in the inspection of sandwiched solder bonds in integrated circuit packages”. Other titles include “Compression of surface light fields”, “Method of seismic surveying”, “Semantic querying of a peer-to-peer network”, “Biochemical markers of brain function”, and “Diabetes management.” www.uspto.gov/patft A Few Search Results for “Singular Value Decomposition” ■ The Wikipedia pages on SVD and PCA are quite good and contain a number of useful links, although not to each other. en.wikipedia.org/wiki/Singular_value_decomposition en.wikipedia.org/wiki/Principal_component_analysis ■ Rasmus Bro, a professor at the Royal Veterinary and Agricultural University in Denmark, and Barry Wise, head of Eigenvector Research in Wenatchee, Washington, both do chemometrics using SVD and PCA. One example involves the analysis of the absorption spectrum of water samples from a lake to identify upstream sources of pollution. www.models.kvl.dk/users/rasmus www.eigenvector.com ■ Tammy Kolda and Brett Bader, at Sandia National Labs in Livermore, ca, developed the Tensor Toolbox for MATLAB, which provides generalizations of PCA to multidimensional data sets. csmr.ca.sandia.gov/~tgkolda/TensorToolbox ■ In 2003, Lawrence Sirovich of the Mount Sinai School of Medicine published “A pattern analysis of the second Rehnquist U.S. Supreme Court” in the Proceedings of the US National Academy of Sciences. His paper led to articles in the New York Times and the Washington Post because it provides a nonpolitical, phenomenological model of court decisions. Between 1994 and 2002, the court heard 468 cases. Since there are nine justices, each of whom takes a majority or minority position on each case, the data is a 468-by-9 matrix of +1s and -1s. If the judges had made their decisions by flipping coins, this matrix would almost certainly have rank 9. But Sirovich found that the third singular value is an order of magnitude smaller than the first one, so the matrix is well approximated by a matrix of rank 2. In other words, most of the court’s decisions are close to being in a two-dimensional subspace of all possible decisions. www.pnas.org/cgi/reprint/100/13/7432 ■ Latent Semantic Indexing involves the use of SVD with termdocument matrices to perform document retrieval. For example, should a search for “singular value” also look for “eigenvalue”? See a 1999 SIAM Review paper by Michael Berry, Zlato Drmac, and Liz Jessup, “Matrices, Vector Spaces, and Information Retrieval.” epubs.siam.org/SIREV/volume-41/art_34703.html Resources 4 On the Early History of the Singular Value Decomposition locus.siam.org/SIREV/volume-35/art_1035134.html 4 Cleve’s Corner Collection www.mathworks.com/res/cleve