《现代数值分析》课程参考资料（数值线性代数）Francis's Algorithm（Watkins, American Mathematical Monthly, 2011）.pdf_大学文库

Francis's Algorithm David S.Watkins Abstract.John Francis's implicitly shifted OR algorithm turned the problem of matrix eigen- value computation from difficult to routine almost overnight about fifty years ago.It was named one of the top ten algorithms of the twentieth century by Dongarra and Sullivan,and it deserves to be more widely known and understood by the general mathematical community. This article provides an efficient introduction to Francis's algorithm that follows a novel path. Efficiency is gained by omitting the traditional but wholly unnecessary detour through the basic OR algorithm.A brief history of the algorithm is also included.It was not a one-man show:some other important names are Rutishauser,Wilkinson.and Kublanovskaya.Francis was never a specialist in matrix computations.He was employed in the early computer indus- try,spent some time on the problem of eigenvalue computation and did amazing work,and then moved on to other things.He never looked back,and he remained unaware of the huge impact of his work until many years later. 1.INTRODUCTION.A problem that arises frequently in scientific computing ap plications is that of computing the eigenvalues and eigenvectors of a real or complex square matrix.Nowadays we take it for granted that we can do this.For example,I was able to compute a complete set of eigenvalues and eigenvectors of a real 1000 x 1000 matrix on my laptop in about 15 seconds using MATLAB.Not only MATLAB has this capability;it is built into a large number of packages,both proprietary and open source.Not long ago [7]the complete eigensystem of a dense matrix of order 10 was computed on a parallel supercomputer.These feats are made possible by good soft- ware and good hardware.Today's computers are fast and have large memories,and they are reliable. Fifty years ago the situation was very different.Computers were slow,had small memories,and were unreliable.The software situation was equally bad.Nobody knew how to compute the eigenvalues of,say,a 10x10 matrix in an efficient,reliable way It was in this environment that young John Francis found himself writing programs for the Pegasus computer at the National Defense Research Corporation in London A major problem at that time was the flutter of aircraft wings,and for the analysis there was a need to compute eigenvalues.Francis was given the task of writing some matrix computation routines,including some eigenvalue programs.His interest in the problem grew far beyond what he had been assigned to do.He had obtained a copy of Heinz Rutishauser's paper on the LR algorithm [13],he had attended seminar talks by James Wilkinson2 in which he learned about the advantages of computing with or- thogonal matrices,and he saw the possibility of creating a superior method.He worked on this project on the side,with the tolerance (and perhaps support)of his supervisor, Christopher Strachey.In 1959 he created and tested the procedure that is now com monly known as the implicitly-shifted OR algorithm,which I prefer to call Francis's doi:10.4169/amer.math.monthly.118.05.387 Most large matrices that arise in applications are sparse,not dense.That is,the vast majority of their entries are zeros.For sparse matrices the this paper. 2A few years later,Wilkinson wrote the highly influential book The Algebraic Eigemvalue Problem [201. May2011] FRANCIS'S ALGORITHM 387

Francis’s Algorithm David S. Watkins Abstract. John Francis’s implicitly shifted QR algorithm turned the problem of matrix eigenvalue computation from difficult to routine almost overnight about fifty years ago. It was named one of the top ten algorithms of the twentieth century by Dongarra and Sullivan, and it deserves to be more widely known and understood by the general mathematical community. This article provides an efficient introduction to Francis’s algorithm that follows a novel path. Efficiency is gained by omitting the traditional but wholly unnecessary detour through the basic QR algorithm. A brief history of the algorithm is also included. It was not a one-man show; some other important names are Rutishauser, Wilkinson, and Kublanovskaya. Francis was never a specialist in matrix computations. He was employed in the early computer industry, spent some time on the problem of eigenvalue computation and did amazing work, and then moved on to other things. He never looked back, and he remained unaware of the huge impact of his work until many years later. 1. INTRODUCTION. A problem that arises frequently in scientific computing applications is that of computing the eigenvalues and eigenvectors of a real or complex square matrix. Nowadays we take it for granted that we can do this. For example, I was able to compute a complete set of eigenvalues and eigenvectors of a real 1000 × 1000 matrix on my laptop in about 15 seconds using MATLAB R . Not only MATLAB has this capability; it is built into a large number of packages, both proprietary and open source. Not long ago [7] the complete eigensystem of a dense matrix of order 105 was computed on a parallel supercomputer.1 These feats are made possible by good software and good hardware. Today’s computers are fast and have large memories, and they are reliable. Fifty years ago the situation was very different. Computers were slow, had small memories, and were unreliable. The software situation was equally bad. Nobody knew how to compute the eigenvalues of, say, a 10 × 10 matrix in an efficient, reliable way. It was in this environment that young John Francis found himself writing programs for the Pegasus computer at the National Defense Research Corporation in London. A major problem at that time was the flutter of aircraft wings, and for the analysis there was a need to compute eigenvalues. Francis was given the task of writing some matrix computation routines, including some eigenvalue programs. His interest in the problem grew far beyond what he had been assigned to do. He had obtained a copy of Heinz Rutishauser’s paper on the LR algorithm [13], he had attended seminar talks by James Wilkinson2 in which he learned about the advantages of computing with orthogonal matrices, and he saw the possibility of creating a superior method. He worked on this project on the side, with the tolerance (and perhaps support) of his supervisor, Christopher Strachey. In 1959 he created and tested the procedure that is now commonly known as the implicitly-shifted QR algorithm, which I prefer to call Francis’s doi:10.4169/amer.math.monthly.118.05.387 1Most large matrices that arise in applications are sparse, not dense. That is, the vast majority of their entries are zeros. For sparse matrices there are special methods [1] that can compute selected eigenvalues of extremely large matrices (much larger than 105 ). These methods are important, but they are not the focus of this paper. 2A few years later, Wilkinson wrote the highly influential book The Algebraic Eigenvalue Problem [20]. May 2011] FRANCIS’S ALGORITHM 387

algorithm.It became and has continued to be the big workhorse of eigensystem com- putations.A version of Francis's algorithm was used by MATLAB when I asked it to compute the eigensystem of a 1000 x 1000 matrix on my laptop.Another version was used for the computation on the matrix of order 10 reported in [7]. Once Francis and never lo d his wo or e) mple.compiler deve was hen Franc a huge impact had had 2.PRECURSORS OF FRANCIS'S ALGORITHM.Most of us leamed in an in lin rce that the to f characteristic poly omial and factor it to get the zeros.Because of the equivalence of the eigenvalue problem with that of finding the roots of a polynomial equation,it is difficult to put a date on the beginning of the history of eigenvalue computations.The polynomial problem is quite old,and we all know something about its history.In par. ticular,early in the 19th century Abel proved that there is no general formula(using only addition subtraction multiplication division and the extraction of roots)for the roots of a polynomial of degree five.This result,which is one of the crowning achieve. ments of Galois theory,has the practical consequence for numerical analysts that all methods for computing eigenvalues are iterative.Direct methods.such as Gaussiar elimination for solving a linear system Ax =b,do not exist for the eigenvalue prob lem. The early history of eigenvalue computation is intertwined with that of computing roots of polynomi equation Eventually it was realized that forming the chara teristic polynomia e a g the zeroso nomial are of pertu 9, thods ng eige ectly wit ing the eristic polyno lem h ortant part computatio l mathe the polynom root find 1n9 Hadam d9,0 tant early vhich h e propo the oles of a mor phic function given the sequence of coefficients(s).Some sixty years later Eduard Stiefel,founder of the Institute for Applied Mathematics at ETH Zurich,set a related problem for his enomnguishauser:given a matr A.determine its eigenvalues from 4=yTAx,k=0,1,2, where x and y are(more or less)arbitrary vectors.If the moments are used to define a function f as in (1),every pole of f is an eigenvalue of A.Moreover,for almost any choice of vectorsx and y.the complete set of poles of f is exactly the complete set of eigenvalues of A.Thus the problem posed by Stiefel is essentially equivalent to the problem addressed by Hadamard.Rutishauser came up with a new method,which he called the quotient-difference(qd)algorithm [11,12].that solves this problem.Today 388 THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118

algorithm. It became and has continued to be the big workhorse of eigensystem computations. A version of Francis’s algorithm was used by MATLAB when I asked it to compute the eigensystem of a 1000 × 1000 matrix on my laptop. Another version was used for the computation on the matrix of order 105 reported in [7]. Once Francis finished his work on the QR project, he moved on to other things and never looked back. In the infant computer industry there were many urgent tasks, for example, compiler development. Within a few years the importance of Francis’s algorithm was recognized, but by then Francis had left the world of numerical analysis. Only many years later did he find out what a huge impact his algorithm had had. 2. PRECURSORS OF FRANCIS’S ALGORITHM. Most of us learned in an introductory linear algebra course that the way to compute eigenvalues is to form the characteristic polynomial and factor it to get the zeros. Because of the equivalence of the eigenvalue problem with that of finding the roots of a polynomial equation, it is difficult to put a date on the beginning of the history of eigenvalue computations. The polynomial problem is quite old, and we all know something about its history. In particular, early in the 19th century Abel proved that there is no general formula (using only addition, subtraction, multiplication, division, and the extraction of roots) for the roots of a polynomial of degree five. This result, which is one of the crowning achievements of Galois theory, has the practical consequence for numerical analysts that all methods for computing eigenvalues are iterative. Direct methods, such as Gaussian elimination for solving a linear system Ax = b, do not exist for the eigenvalue problem. The early history of eigenvalue computation is intertwined with that of computing roots of polynomial equations. Eventually it was realized that forming the characteristic polynomial might not be a good idea, as the zeros of a polynomial are often extremely sensitive to small perturbations in the coefficients [19, 21]. All modern methods for computing eigenvalues work directly with the matrix, avoiding the characteristic polynomial altogether. Moreover, the eigenvalue problem has become a much more important part of computational mathematics than the polynomial root finding problem is. For our story, an important early contribution was the 1892 dissertation of Jacques Hadamard [9], in which he proposed a method for computing the poles of a meromorphic function f (z) = X∞ k=0 sk z k+1 , (1) given the sequence of coefficients (sk ). Some sixty years later Eduard Stiefel, founder of the Institute for Applied Mathematics at ETH Zurich, set a related problem for his ¨ young assistant, Heinz Rutishauser: given a matrix A, determine its eigenvalues from the sequence of moments sk = y T A k x, k = 0, 1, 2, . . . , where x and y are (more or less) arbitrary vectors. If the moments are used to define a function f as in (1), every pole of f is an eigenvalue of A. Moreover, for almost any choice of vectors x and y, the complete set of poles of f is exactly the complete set of eigenvalues of A. Thus the problem posed by Stiefel is essentially equivalent to the problem addressed by Hadamard. Rutishauser came up with a new method, which he called the quotient-difference (qd) algorithm [11, 12], that solves this problem. Today 388 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118

me ensitive to small chan However.Rutishauser saw that there were multiple ways to organize and interpre his qd algorithm.He reformulated it as a process of factorization and recombination of tridiagonal matrices,and from there he was able to generalize it to obtain the LR algorithm [13].For more information about this seminal work see [8],in addition to the works of rutishauser cited above Francis knew about Rutishauser's work,and so did Vera Kublanovskaya,working in Leningrad.Francis [3]and Kublanovskaya [10]independently proposed the OR algorithm,which promises greater stability by replacing LR decompositions with OR decompositions.The basic OR algorithm is as follows:Factor your matrix A into a product A=OR,where is unitary and R is upper triangular.Then reverse the factors and multiply them back together to get a new matrix:RO=A.Briefly we have A=OR. RO=A. This is one iteration of the basic OR algorithm.It is easy to check that A=O-AO. so A is u arily similar to A and therefor has the same eigenvalues.If the process is Ai=OiRi Rioi=Ai the s n the If A is real,the entire rocess can be done ir nain diagonal The basic OR algo thm of origin: Aj-1-pil=QjRj. Rioi+pil=Ai e Again it is y to show that the matrice choices of shif e th that approximates an eige value well. Many applications feature real matrices that have,however,lots of complex eigen- values.If one wants rapid convergence of complex eigenvalues,one needs to use com- plex shifts and carry out (2)in complex arithmetic.This posed a problem for Francis: Working in complex arithmetic on the Pegasus computer would have been really diffi- cult.Moreover,complex matrices require twice as much storage space as real matrices do,and Pegasus didn't have much storage space.He therefore looked for a method that avoids complex arithmetic. If iterate Ao is real and we do an iteration of (2)with a complex shiftp,the resulting matrix A is,of course complex.However,i we then do anothe r iteration using the IS again real ew th 12 the bie-s a OR algorith Thi eigenva et ale the nal of a block tri ncis coded his age on the pegas and tried it out.Using this new method,the flving horse was able to co ompute all of the eigenvalues of a 24 x 24 matrix in just over ten minutes [41. May2011] FRANCIS'S ALGORITHM 389

we know that this is a bad way to compute eigenvalues, as the poles of f (like the zeros of a polynomial) can be extremely sensitive to small changes in the coefficients. However, Rutishauser saw that there were multiple ways to organize and interpret his qd algorithm. He reformulated it as a process of factorization and recombination of tridiagonal matrices, and from there he was able to generalize it to obtain the LR algorithm [13]. For more information about this seminal work see [8], in addition to the works of Rutishauser cited above. Francis knew about Rutishauser’s work, and so did Vera Kublanovskaya, working in Leningrad. Francis [3] and Kublanovskaya [10] independently proposed the QR algorithm, which promises greater stability by replacing LR decompositions with QR decompositions. The basic QR algorithm is as follows: Factor your matrix A into a product A = QR, where Q is unitary and R is upper triangular. Then reverse the factors and multiply them back together to get a new matrix: RQ = Aˆ. Briefly we have A = QR, RQ = Aˆ. This is one iteration of the basic QR algorithm. It is easy to check that Aˆ = Q −1AQ, so Aˆ is unitarily similar to A and therefore has the same eigenvalues. If the process is now iterated: Aj−1 = Q j Rj, Rj Q j = Aj, the sequence of unitarily similar matrices so produced will (usually) tend toward (block) triangular form, eventually revealing the eigenvalues on the main diagonal. If A is real, the entire process can be done in real arithmetic. The basic QR algorithm converges rather slowly. A remedy is to incorporate shifts of origin: Aj−1 − ρj I = Q j Rj, Rj Q j + ρj I = Aj . (2) Again it is easy to show that the matrices so produced are unitarily similar. Good choices of shifts ρj can improve the convergence rate dramatically. A good shift is one that approximates an eigenvalue well. Many applications feature real matrices that have, however, lots of complex eigenvalues. If one wants rapid convergence of complex eigenvalues, one needs to use complex shifts and carry out (2) in complex arithmetic. This posed a problem for Francis: Working in complex arithmetic on the Pegasus computer would have been really diffi- cult. Moreover, complex matrices require twice as much storage space as real matrices do, and Pegasus didn’t have much storage space. He therefore looked for a method that avoids complex arithmetic. If iterate A0 is real and we do an iteration of (2) with a complex shift ρ, the resulting matrix A1 is, of course, complex. However, if we then do another iteration using the complex conjugate shift ρ, the resulting matrix A2 is again real. Francis knew this, and he looked for a method that would get him directly from A0 to A2. He found such a method, the double-shift QR algorithm [4], which does two iterations of the QR algorithm implicitly, entirely in real arithmetic. This algorithm does not cause convergence to triangular form; it is impossible for complex eigenvalues to emerge on the main diagonal of a real matrix. Instead pairs of complex conjugate eigenvalues emerge in 2 × 2 packets along the main diagonal of a block triangular matrix. Francis coded his new algorithm in assembly language on the Pegasus computer and tried it out. Using this new method, the flying horse was able to compute all of the eigenvalues of a 24 × 24 matrix in just over ten minutes [4]. May 2011] FRANCIS’S ALGORITHM 389

We will describe Francis’s double-shift QR algorithm below, and the reader will see that it is completely different from the shifted QR algorithm (2). Francis had to expend some effort to demonstrate that his new method does indeed effect two steps of (2). When we teach students about Francis’s algorithm today, we still follow this path. We introduce the shifted QR algorithm (2) and talk about why it works, then we introduce Francis’s algorithm, and then we demonstrate equivalence using the so-called implicit-Q theorem. A primary objective of this paper is to demonstrate that this approach is inefficient. It is simpler to introduce Francis’s algorithm straightaway and explain why it works, bypassing (2) altogether.3 3. REFLECTORS AND THE REDUCTION TO HESSENBERG FORM. Before we can describe Francis’s algorithm, we need to introduce some basic tools of numerical linear algebra. For simplicity we will restrict our attention to the real case, but everything we do here can easily be extended to the complex setting. Let x and y be two distinct vectors in R n with the same Euclidean norm: kxk2 = kyk2 , and let S denote the hyperplane (through the origin) orthogonal to x − y. Then the linear transformation Q that reflects vectors through S clearly maps x to y and vice versa. Since Q preserves norms, it is unitary: Q ∗ = Q −1 . It is also clearly involutory: Q −1 = Q, so it is also Hermitian: Q ∗ = Q. Viewing Q as a matrix, it is real, orthogonal (Q T = Q −1 ), and symmetric (Q T = Q). A precise expression for the matrix Q is Q = I − 2uuT , where u = (x − y)/kx − yk2 . Now consider an arbitrary nonzero x and a special y: x =     x1 x2 . . . xn     , y =     y1 0 . . . 0     , where y1 = ±kxk2 . Since x and y have the same norm, we know that there is a reflector Q such that Qx = y. Such an operator is clearly a boon to numerical linear algebraists, as it gives us a means of introducing large numbers of zeros. Letting e1 denote the vector with a 1 in the first position and zeros elsewhere, as usual, we can restate our result as follows. Theorem 1. Let x ∈ R n be any nonzero vector. Then there is a reflector Q such that Qx = αe1, where α = ±kxk2 . In the world of matrix computations, reflectors are also known as Householder transformations. For details on the practical construction and use of reflectors see [17] or [6], for example. With reflectors in hand we can easily transform any matrix nearly all the way to triangular form. A matrix A is called upper Hessenberg if it satisfies ai j = 0 whenever i > j + 1. For example, a 5 × 5 upper Hessenberg matrix has the form 3 It is only fair to admit that this is not my first attempt. The earlier work [16], which I now view as premature, failed to expose fully the role of Krylov subspaces (explained below) for the functioning of Francis’s algorithm. It is my fervent hope that I will not view the current work as premature a few years from now. 390 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118

     ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗      , where blank spots denote zeros and asterisks denote numbers that can have any real value. Theorem 2. Every A ∈ R n×n is orthogonally similar to an upper Hessenberg matrix: H = Q −1AQ, where Q is a product of n − 2 reflectors. Proof. The first reflector Q1 has the form Q1 = 1 Q˜ 1 , (3) where Q˜ 1 is an (n − 1) × (n − 1) reflector such that Q˜ 1     a21 a31 . . . an1     =     ∗ 0 . . . 0     . If we multiply A on the left by Q1, we obtain a matrix Q1A that has zeros in the first column from the third entry on. To complete a similarity transformation, we must multiply on the right by Q1 (= Q −1 1 ). Because Q1 has the form (3), the transformation Q1A 7→ Q1AQ1 does not alter the first column. Thus Q1AQ1 has the desired zeros in the first column. The second reflector Q2 has the form Q2 =   1 1 Q˜ 2   and is constructed so that Q2Q1AQ1Q2 has zeros in the second column from the fourth position on. This transformation does not disturb the zeros that were created in the first column in the first step. The details are left for the reader (or see [17], for example). The third reflector Q3 creates zeros in the third column, and so on. After n − 2 reflectors, we will have produced a matrix H = Qn−2 · · · Q1AQ1 · · · Qn−2 in upper Hessenberg form. Letting Q = Q1 · · · Qn−2, we have Q −1 = Qn−2 · · · Q1, and H = Q −1AQ. The proof of Theorem 2 is constructive; that is, it gives a finite algorithm (a direct method) for computing H and Q. The operations used are those required for setting up and applying reflectors, namely addition, subtraction, multiplication, division, and the extraction of square roots [6, 17]. What we really want is an algorithm to get us all the way to upper triangular form, from which we can read off the eigenvalues. But Hessenberg form is as far as we can get with a finite algorithm using the specified operations. If we had a general procedure for creating even one more zero below the main diagonal, we would be able to split the eigenvalue problem into two smaller eigenvalue problems. If we could do this, we would be able to split each of the smaller May 2011] FRANCIS’S ALGORITHM 391

problems into still smaller problems and eventually get to triangular form by a finite algorithm.This would imply the existence of a formula for the zeros of a general nth degree polynomial,in violation of Galois theory. The ability to reduce a matrix to upper Hessenberg form is quite useful.If the ORal s that A are upper H s results ir nuch more e oncanbeli0ne9nnaipotgteORdcecomposii om n an the subsequen en erg cas ell worth the extra expense.In fact ith is' m,the Hessenberg form is absolutely esse ntial 4.FRANCIS'S ALGORITHM.Suppose we want to find the eigenvalues of some matrix A ERx".We continue to focus on the real case;the extension to complex matrices is straightforward.We know from Theorem 2 that we can reduce A to upper Hessenberg form,so let us assume from the outset that A is upper Hessenberg. can assume,moreover,that A is properly upper Hessenberg.This means that all of the subdiagonal entries o H are nonzero:d.0 1.Ind ed,if A A=[] and A.If eithe no propepr Heer matrices.We wil assume,therefore.that A is properly upper An iteration of Francis's algorithm of degree m begins by picking m shifts ou. mIn principal m can be any positive integer,but in practice it should be fairly small Francis took m =2.The rationale for shift selection will be explained later.There are many reasonabl e ways to choose shifts,but the simplest is to take p,....Pm to be the eigenvalues of the m x m submatrix in the lower right-hand corner of A.This is an easy computation if m =2. if m 1,this shi that it have this property:when the matrix is real,complex shifts must be produced in pA)=(A-pmI)…(A-pI). We do not actually compute p(A),as this would be too expensive.Francis's algorithm just needs the first column x p(A)e1, which is easily co -vecto multiplicatioas.T .com 1s9 se upper Hessen easy check th pil)ei ha only its fir 2( A-PiDe as nonzero en ee posi and so on.As a co nonzero entres in only its rst m+I positions e entire computation c depen 392 THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118

problems into still smaller problems and eventually get to triangular form by a finite algorithm. This would imply the existence of a formula for the zeros of a general nth degree polynomial, in violation of Galois theory. The ability to reduce a matrix to upper Hessenberg form is quite useful. If the QR algorithm (2) is initiated with a matrix A0 that is upper Hessenberg, then (except in some special cases that can be avoided) all iterates Aj are upper Hessenberg. This results in much more economical iterations, as both the QR decomposition and the subsequent RQ multiplication can be done in many fewer operations in the Hessenberg case. Thus an initial reduction to upper Hessenberg form is well worth the extra expense. In fact, the algorithm would not be competitive without it. For Francis’s implicitly-shifted QR algorithm, the Hessenberg form is absolutely essential. 4. FRANCIS’S ALGORITHM. Suppose we want to find the eigenvalues of some matrix A ∈ R n×n . We continue to focus on the real case; the extension to complex matrices is straightforward. We know from Theorem 2 that we can reduce A to upper Hessenberg form, so let us assume from the outset that A is upper Hessenberg. We can assume, moreover, that A is properly upper Hessenberg. This means that all of the subdiagonal entries of A are nonzero: aj+1,j 6= 0 for j = 1, 2, . . . , n − 1. Indeed, if A is not properly upper Hessenberg, say ak+1,k = 0, then A has the block-triangular form A = A11 A12 0 A22 , where A11 is k × k. Thus the eigenvalue problem for A reduces to eigenvalue problems for the smaller upper Hessenberg matrices A11 and A22. If either of these is not properly upper Hessenberg, we can break it apart further until we are finally left with proper upper Hessenberg matrices. We will assume, therefore, that A is properly upper Hessenberg from the outset. An iteration of Francis’s algorithm of degree m begins by picking m shifts ρ1, . . . , ρm. In principal m can be any positive integer, but in practice it should be fairly small. Francis took m = 2. The rationale for shift selection will be explained later. There are many reasonable ways to choose shifts, but the simplest is to take ρ1, . . . , ρm to be the eigenvalues of the m × m submatrix in the lower right-hand corner of A. This is an easy computation if m = 2. If m > 1, this shifting strategy can produce complex shifts, but they always occur in conjugate pairs. Whether we use this or some other strategy, we will always insist that it have this property: when the matrix is real, complex shifts must be produced in conjugate pairs. Now let p(A) = (A − ρm I)· · ·(A − ρ1 I). We do not actually compute p(A), as this would be too expensive. Francis’s algorithm just needs the first column x = p(A)e1, which is easily computed by m successive matrix-vector multiplications. The computation is especially cheap because A is upper Hessenberg. It is easy to check that (A − ρ1 I)e1 has nonzero entries in only its first two positions, (A − ρ2 I)(A − ρ1 I)e1 has nonzero entries in only its first three positions, and so on. As a consequence, x has nonzero entries in only its first m + 1 positions. The entire computation of x depends 392 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118

on only the first m columns of A (and the shifts) and requires negligible computational effort if m is small. Since the complex shifts occur in conjugate pairs, p(A) is real. Therefore x is real. Let x˜ ∈ R m+1 denote the vector consisting of the first m + 1 entries of x (the nonzero part). Theorem 1 guarantees that there is an (m + 1) × (m + 1) reflector Q˜ 0 such that Q˜ 0x˜ = αe1, where α = ±k ˜xk2 . Let Q0 be an n × n reflector given by Q0 = Q˜ 0 I . (4) Clearly Q0x = αe1 and, since Q0 is an involution, Q0e1 = α −1 x. Thus the first column of Q0 is proportional to x. Now use Q0 to perform a similarity transformation: A 7→ Q −1 0 AQ0 = Q0AQ0. This disturbs the Hessenberg form but only slightly. Because Q0 has the form (4), the transformation A 7→ Q0A affects only the first m + 1 rows. Typically the first m + 1 rows are completely filled in. The transformation Q0A 7→ Q0AQ0 affects only the first m + 1 columns. Since am+2,m+1 6= 0, row m + 2 gets filled in by this transformation. Below row m + 2, we have a big block of zeros, and these remain zero. The total effect is that there is a bulge in the Hessenberg form. In the case m = 3 and n = 7, the matrix Q0AQ0 looks like          ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗          . In this 7 × 7 matrix the bulge looks huge. If you envision, say, the 100 × 100 case (keeping m = 3), you will see that the matrix is almost upper Hessenberg with a tiny bulge at the top. The rest of the Francis iteration consists of returning this matrix to upper Hessenberg form by the algorithm sketched in the proof of Theorem 2. This begins with a transformation Q1 that acts only on rows 2 through n and creates the desired zeros in the first column. Since the first column already consists of zeros after row m + 2, the scope of Q1 can be restricted a lot further in this case. It needs to act on rows 2 through m + 2 and create zeros in positions (3, 1), . . . , (m + 2, 1). Applying Q1 on the left, we get a matrix Q1Q0AQ0 that has the Hessenberg form restored in the first column. Completing the similarity transformation, we multiply by Q1 on the right. This recombines columns 2 through m + 2. Since am+3,m+2 6= 0, additional nonzero entries are created in positions (m + 3, 2), . . . , (m + 3, m + 1). In the case m = 3 and n = 7 the matrix Q1Q0AQ0Q1 looks like          ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗          . May 2011] FRANCIS’S ALGORITHM 393

The bulge has not gotten any smaller, but it has been pushed one position to the right and downward. This establishes the pattern for the process. The next transformation will push the bulge over and down one more position, and so on. Thinking again of the 100 × 100 case, we see that a long sequence of such transformations will chase the bulge down through the matrix until it is finally pushed off the bottom. At this point, Hessenberg form will have been restored and the Francis iteration will be complete. For obvious reasons, Francis’s algorithm is sometimes referred to as a bulge chasing algorithm. An iteration of Francis’s algorithm of degree m can be summarized briefly as follows. 1. Pick some shifts ρ1, . . . , ρm. 2. Compute x = p(A)e1 = (A − ρm I)· · ·(A − ρ1 I)e1. 3. Compute a reflector Q0 whose first column is proportional to x. 4. Do a similarity transformation A 7→ Q0AQ0, creating a bulge. 5. Return the matrix to upper Hessenberg form by chasing the bulge. Notice that, although this algorithm is commonly known as the implicitly-shifted QR algorithm, its execution does not require any QR decompositions. Why, then, should we call it the QR algorithm? The name is misleading, and for this reason I prefer to call it Francis’s algorithm. Letting Aˆ denote the final result of the Francis iteration, we have Aˆ = Qn−2 · · · Q1Q0AQ0Q1 · · · Qn−2, where Q0 is the transformation that creates the bulge, and Q1, . . . , Qn−2 are the transformations that chase it. Letting Q = Q0Q1 · · · Qn−2, we have Aˆ = Q −1AQ. Recall that Q0 was built in such a way that Q0e1 = βx = βp(A)e1. Looking back to the proof of Theorem 2, we recall that each of the other Qi has e1 as its first column. Thus Qi e1 = e1, i = 1, . . . , n − 2. We conclude that Qe1 = Q0Q1 · · · Qn−2e1 = Q0e1 = βx. We summarize these findings in a theorem. Theorem 3. A Francis iteration of degree m with shifts ρ1, . . . , ρm effects an orthogonal similarity transformation Aˆ = Q −1AQ, where Qe1 = βp(A)e1 = β(A − ρm I)· · ·(A − ρ1 I)e1 for some nonzero β. In words, the first column of Q is proportional to the first column of p(A). 394 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118

In the next section we will see why this procedure is so powerful. Repeated Francis iterations with well-chosen shifts typically result in rapid convergence, in the sense that an−m+1,n−m → 0 quadratically.4 In a few iterations, an−m+1,n−m will be small enough to be considered zero. We can then deflate the problem: A = A11 A12 0 A22 . The (small) m × m matrix A22 can be resolved into m eigenvalues with negligible work. (Think of the case m = 2, for example.) We can then focus on the remaining (n − m) × (n − m) submatrix A11 and go after another set of m eigenvalues. 5. WHY FRANCIS’S ALGORITHM WORKS. Subspace Iteration. At the core of Francis’s algorithm lies the humble power method. In the k-dimensional version, which is known as subspace iteration, we pick a kdimensional subspace S of R n (or perhaps C n ) and, through repeated multiplication by A, build the sequence of subspaces S, AS, A 2S, A 3S, . . . . (5) Here by AS we mean {Ax | x ∈ S}. To avoid complications in the discussion, we will make some simplifying assumptions. We will suppose that all of the spaces in the sequence have the same dimension k. 5 We also assume that A is a diagonalizable matrix with n linearly independent eigenvectors v1, . . . , vn and associated eigenvalues λ1, . . . , λn. 6 Sort the eigenvalues and eigenvectors so that |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Any vector x can be expressed as a linear combination x = c1v1 + c2v2 + · · · + cnvn for some (unknown) c1, . . . , cn. Thus clearly A j x = c1λ j 1 v1 + · · · + ckλ j k vk + ck+1λ j k+1 vk+1 + · · · + cnλ j n vn, j = 1, 2, 3, . . . . If |λk | > |λk+1|, the components in the directions v1, . . . , vk will grow relative to the components in the directions vk+1, . . . , vn as j increases. As a consequence, unless our choice of S was very unlucky, the sequence (5) will converge to the k-dimensional invariant subspace spanned by v1, . . . , vk . The convergence is linear with ratio |λk+1/λk |, which means that the error is reduced by a factor of approximately |λk+1/λk | on each iteration [18, 15]. Often the ratio |λk+1/λk | will be close to 1, so convergence will be slow. In an effort to speed up convergence, we could consider replacing A by some p(A) in (5), giving the iteration S, p(A)S, p(A) 2S, p(A) 3S, . . . . (6) 4This is a simplification. It often happens that some shifts are much better than others, resulting in an−k+1,n−k → 0 for some k < m. 5Of course it can happen that the dimension decreases in the course of the iterations. This does not cause any problems for us. In fact, it is good news [15], but we will not discuss that case here. 6The non-diagonalizable case is more complicated but leads to similar conclusions. See [18] or [15], for example. May 2011] FRANCIS’S ALGORITHM 395

If p(z) = (z − ρm)· · ·(z − ρ1) is a polynomial of degree m, each step of (6) amounts to m steps of (5) with shifts ρ1, . . . , ρm. The eigenvalues of p(A) are p(λ1), . . . , p(λn). If we renumber them so that |p(λ1)| ≥ · · · ≥ |p(λn)|, the rate of convergence of (6) will be |p(λk+1)/p(λk )|. Now we have a bit more flexibility. Perhaps by a wise choice choice of shifts ρ1, . . . , ρm, we can make this ratio small, at least for some values of k. Subspace Iteration with Changes of Coordinate System. As we all know, a similarity transformation Aˆ = Q −1AQ is just a change of coordinate system. A and Aˆ are two matrices that represent the same linear operator with respect to two different bases. Each vector in R n is the coordinate vector of some vector v in the vector space on which the operator acts. If the vector v has coordinate vector x before the change of coordinate system, it will have coordinate vector Q −1 x afterwards. Now consider a step of subspace iteration applied to the special subspace Ek = span{e1, . . . , ek }, where ei is the standard basis vector with a one in the ith position and zeros elsewhere. The vectors p(A)e1, . . . , p(A)ek are a basis for the space p(A)Ek . Let q1, . . . , qk be an orthonormal basis for p(A)Ek , which could be obtained by some variant of the Gram-Schmidt process, for example. Let qk+1, . . . , qn be additional orthonormal vectors such that q1, . . . , qn together form an orthonormal basis of R n , and let Q = q1 · · · qn ∈ R n×n . Since q1, . . . , qn are orthonormal, Q is an orthogonal matrix. Now use Q to make a change of coordinate system Aˆ = Q −1AQ = Q T AQ. Let us see what this change of basis does to the space p(A)Ek . To this end we check the basis vectors q1, . . . , qk . Under the change of coordinate system, these get mapped to Q T q1, . . . , Q T qk . Because the columns of Q are the orthonormal vectors q1, . . . , qn, the vectors Q T q1, . . . , Q T qk are exactly e1, . . . , ek . Thus the change of coordinate system maps p(A)Ek back to Ek . Now, if we want to do another step of subspace iteration, we can work in the new coordinate system, applying p(Aˆ) to Ek . Then we can do another change of coordinate system and map p(Aˆ)Ek back to Ek . If we continue to iterate in this manner, we produce a sequence of orthogonally similar matrices (Aj) through successive changes of coordinate system, and we are always dealing with the same subspace Ek . This is a version of subspace iteration for which the subspace stays fixed and the matrix changes. What does convergence mean in this case? As j increases, Ek comes closer and closer to being invariant under Aj . The special space Ek is invariant under Aj if and only if Aj has the block-triangular form Aj = " A (j) 11 A (j) 12 0 A (j) 22 # , where A (j) 11 is k × k. Of course, we just approach invariance; we never attain it exactly. In practice we will have Aj = " A (j) 11 A (j) 12 A (j) 21 A (j) 22 # , 396 c THE MATHEMATICAL ASSOCIATION OF AMERICA [Monthly 118