18. J/16.394J: The Mathematics of Infinite Random Matrices Essentials of Finite Random Matrix Theory Alan edelman Handout #6, Tuesday, September 28, 2004 This handout provides the essential elements needed to understand finite random matrix theory. A cursory observation should reveal that the tools for infinite random matrix theory are quite different from the tools for finite random matrix theory. Nonetheless, there are significantly more published applications that use finite random matrix theory as opposed to infinite random matrix theory. Our belief is that many of the results that have been historically derived using finite random matrix theory can be reformulated and answered using infinite random matrix theory. In this sense, it is worth recognizing that in many applications it is an integral of a function of the eigenvalues that is more important that the mere distribution of the eigenvalues. For finite random matrix theory, the tools that often come into play when setting up such integrals are the Matric Jacobians, the Joint Eigenvalue Densities and the Cauchy-Binet theorem. We describe these in subsequent section 1 Matrix and vector Differentiation In this section. we concern ourselves with the differentiation of matrices. Differentiating matrix and vector functions is not significantly harder than differentiating scalar functions except that we need notation to keep track of the variables. We titled this section"matrix and vector"differentiation, but of course it is the function that we differentiate. The matrix or vector is just a notational package for the scalar functions involved. In the end. a derivative is nothing more than the "linearization" of all the involved functions We find it useful to think of this linearization both symbolically(for manipulative purposes)as well as numerically (in the sense of small numerical perturbations ). The differential notation idea captures these viewpoints very well We begin with the familiar product rule for scalars d(uv)=u(du)+u(du) from which we can derive that d(z)=3 z2dz. We refer to dz as a differential We all unconsciously interpret the"dr" symbolically as well as numerically. Sometimes it is nice to confirm on a computer that (x+e)3-x3 I do this by taking z to be 1 or 2 or randn(1) and e to be 001 or. 0001 The product rule holds for matrices as well d(Uv)=U(dv)+(du)V In the examples we will see some symbolic and numerical interpretations. Example 1: Y=X3 We use the product rule to differentiate Y(X)=x to obtain that d(x)=X(dX)+ X(dX)X+(dx)X
1 18.338J/16.394J: The Mathematics of Infinite Random Matrices Essentials of Finite Random Matrix Theory Alan Edelman Handout #6, Tuesday, September 28, 2004 This handout provides the essential elements needed to understand finite random matrix theory. A cursory observation should reveal that the tools for infinite random matrix theory are quite different from the tools for finite random matrix theory. Nonetheless, there are significantly more published applications that use finite random matrix theory as opposed to infinite random matrix theory. Our belief is that many of the results that have been historically derived using finite random matrix theory can be reformulated and answered using infinite random matrix theory. In this sense, it is worth recognizing that in many applications it is an integral of a function of the eigenvalues that is more important that the mere distribution of the eigenvalues. For finite random matrix theory, the tools that often come into play when setting up such integrals are the Matrix Jacobians, the Joint Eigenvalue Densities and the Cauchy-Binet theorem. We describe these in subsequent sections. Matrix and Vector Differentiation In this section, we concern ourselves with the differentiation of matrices. Differentiating matrix and vector functions is not significantly harder than differentiating scalar functions except that we need notation to keep track of the variables. We titled this section “matrix and vector” differentiation, but of course it is the function that we differentiate. The matrix or vector is just a notational package for the scalar functions involved. In the end, a derivative is nothing more than the “linearization” of all the involved functions. We find it useful to think of this linearization both symbolically (for manipulative purposes) as well as numerically (in the sense of small numerical perturbations). The differential notation idea captures these viewpoints very well. We begin with the familiar product rule for scalars, d(uv) = u(dv) + v(du), from which we can derive that d(x3) = 3x2dx. We refer to dx as a differential. We all unconsciously interpret the “dx” symbolically as well as numerically. Sometimes it is nice to confirm on a computer that 3 (x + ǫ)3 − x ≈ 3x 2 . (1) ǫ I do this by taking x to be 1 or 2 or randn(1) and ǫ to be .001 or .0001. The product rule holds for matrices as well: d(UV ) = U(dV ) + (dU)V . In the examples we will see some symbolic and numerical interpretations. Example 1: Y = X3 We use the product rule to differentiate Y (X) = X3 to obtain that d(X3) = X2(dX) + X(dX)X + (dX)X2
When I introduce undergraduate students to matrix multiplication, I tell them that matrices are like scalars except that they do not commute at first. Numerically take X-randn(n)and E-randn(n) for e=001 say, and then compule m less familiar The numerical (or first order perturbation theory) interpretation applies, but it may see X+∈E ≈X2E+XEX+EX This is the matrix version of (1). Holding X fixed and allowing E to vary, the right-hand side is a linear function of E. There is no simpler form possible Symbolically(or numerically) one can take dX= Ek which is the matrix that has a one in element(k, 1) nd 0 elsewhere. Then we can write down the matrix of partial derivatives 0X3 X(Ekl)+X(Ek)X+(Ek)X As we let h, I vary over all possible indices, we obtain all the information we need to compute the derivative in any general direction E In general, the directional derivative of Yi,(X)in the direction dX is given by(dY)ij. For a particula matrix X, dy(X) is a matrix of directional derivatives corresponding to a first order perturbation in tl direction E=dX. It is a matrix of linear functions corresponding to the linearization of Y(X)about X Structured Perturbations We sometimes restrict our E to be a structured perturbation. For example if X is triangular, symmetric antisymmetric, or even sparse then often we wish to restrict E so that the pattern is maintained in the perturbed matrix as well. An important case occurs when X is orthogonal. We will see in an example belor that we will want to restrict E so that X E is antisymmetric when X is orthogonal Here y is a scalar and dot products commute so that dy= 2r dz. When y= l, a is on the unit sphere. o stay on the sphere, we need dy=0 so that z dr=0, i.e., the tangent to the sphere is perpendicular to the sphere. Note the two uses of dy. In the first case it is the change to the squared length of a. In the second case, by setting dy=0, we find perturbations to a which to first order do not change the length at ll.Indeed if one computes(r+dr)(r+dr) for a small finite dr, one sees that if r de=0 then the length changes only to second order. Geometrically, one can draw a tangent to a circle. The distance to the circle is second order in the distance along the tangent Example 3: y=z Ar Again y is scalar. We have dy=dxTAz +Adz. If A is symmetric then dy=2x"Adz Example 4: Y=X-1 We have that XY=I so that X(dy)+(dX)Y=0 so that dy=-X-dXX-I We recommend that the reader compute e-((X+EE)-X)numerically and verify that it is equal to-x-IEX-I In other words (X+E)-=X--eX-Ex-1+O(e2) Example 5:I=QQ If Q is orthogonal we have that Q dQ +dQ Q=0 so that Q"dQ is antisymmetric. If y is a scalar function of T1, T2,..., In then we have the"chain rule dy +dx2+.+d
When I introduce undergraduate students to matrix multiplication, I tell them that matrices are like scalars, except that they do not commute. The numerical (or first order perturbation theory) interpretation applies, but it may seem less familiar at first. Numerically take X=randn(n) and E=randn(n) for ǫ = .001 say, and then compute (X + ǫE)3 − X3 ≈ X2E + XEX + EX2 . (2) ǫ This is the matrix version of (1). Holding X fixed and allowing E to vary, the right-hand side is a linear function of E. There is no simpler form possible. Symbolically (or numerically) one can take dX = Ekl which is the matrix that has a one in element (k, l) and 0 elsewhere. Then we can write down the matrix of partial derivatives: ∂X3 = X2(Ekl) + X(Ekl)X + (Ekl)X2 . ∂xkl As we let k, l vary over all possible indices, we obtain all the information we need to compute the derivative in any general direction E. In general, the directional derivative of Yij (X) in the direction dX is given by (dY )ij . For a particular matrix X, dY (X) is a matrix of directional derivatives corresponding to a first order perturbation in the direction E = dX. It is a matrix of linear functions corresponding to the linearization of Y (X) about X. Structured Perturbations We sometimes restrict our E to be a structured perturbation. For example if X is triangular, symmetric, antisymmetric, or even sparse then often we wish to restrict E so that the pattern is maintained in the perturbed matrix as well. An important case occurs when X is orthogonal. We will see in an example below that we will want to restrict E so that XTE is antisymmetric when X is orthogonal. Example 2: y = xTx Here y is a scalar and dot products commute so that dy = 2xTdx. When y = 1, x is on the unit sphere. To stay on the sphere, we need dy = 0 so that xTdx = 0, i.e., the tangent to the sphere is perpendicular to the sphere. Note the two uses of dy. In the first case it is the change to the squared length of x. In the second case, by setting dy = 0, we find perturbations to x which to first order do not change the length at all. Indeed if one computes (x+ dx)T (x+ dx) for a small finite dx, one sees that if xT dx = 0 then the length changes only to second order. Geometrically, one can draw a tangent to a circle. The distance to the circle is second order in the distance along the tangent. Example 3: y = xTAx Again y is scalar. We have dy = dxTAx + xTAdx. If A is symmetric then dy = 2xTAdx. Example 4: Y = X−1 We have that XY = I so that X(dY ) + (dX)Y = 0 so that dY = −X−1dXX−1 . We recommend that the reader compute ǫ−1((X + ǫE)−1 − X−1) numerically and verify that it is equal to −X−1EX−1 . In other words, (X + ǫE) −1 = X−1 − ǫX−1EX−1 + O(ǫ2). Example 5: I = QTQ If Q is orthogonal we have that QTdQ + dQTQ = 0 so that QTdQ is antisymmetric. In general, d(QTQ) = QTdQ+dQTQ, but with no orthogonality condition on Q, there is no anti-symmetry condition on QTdQ. If y is a scalar function of x1, x2, . . . , xn then we have the “chain rule” ∂y ∂y ∂y dy = dx1 + dx2 + . . . + dxn . ∂x1 ∂x2 ∂xn
Often we wish to apply the chain rule to every element of a vector or matrix Let X P2+gr pq+rsthen pr+rs gs+82 dY=XdX+dXX 2 Matrix Jacobians(getting started) 2.1 Definition Let IE Rn and y=y(a)E Rn be a differentiable function of a. It is well known that the Jacobian matrix 0 dIn evaluated at a point a approximates y(a) by a linear function. Intuitively y(a +dr)s y(r)+JSr, i.e., J is the matrix that allows us to invoke perturbation theory. The function y may be viewed as performing a hange of variables Furthermore(intuitively)if a little box of n dimensional volume e surrounds a, then it is transformed by y into a parallelopiped of volume det Je around y(a). Therefore the Jacobian det is the magnification factor for volumes If we are integrating some function of y E R as in p(y)dy, (where dy=dy..dyn), then when we change variables from y to z where y= y(a), then the integral becomes ply(a)ldet(a)l dr. For many people this becomes a matter of notation, but one should understand intuitively that the Jacobian tells you how volume elements scale The determinant is 0 exactly where the change of variables breaks down. It is always instructive to see when this happens. Either there is no"a'"locally for each"y" or there are many as in the example of polar coordinates at the Notation: throughout this book, J denotes the Jacobian matrix.(Sometimes called the derivative or simply the Jacobian in the literature. We will consistently write det for the Jacobian determinant (un- fortunately also called the Jacobian in the literature. When we say Jacobian, we will be talking about both 2.2 Simple Examples (n=2 We get our feet wet with some simple 2x 2 examples. Every reader is familiar with changing scalar variables f(a)dr=/f(2)(2y)dy We want the reader to be just as comfortable when f is a scalar function of a matrix and we change X=Y2 f(X)(dx)= One can compute all of the 2 by 2 Jacobians that follow by hand. but in some cases it can be tedious and hard to get right on the first try. Code 8.1 in MATLAB takes away the drudgery and gives the right answer. Later we will learn fancy ways to get the answer without too much drudgery and also without the aid of a computer. We now consider the following examples 2X 2 Example 1: Matrix Square Y=X
R R Z Z Often we wish to apply the chain rule to every element of a vector or matrix. 2 Let X = p q and Y = X2 = p + qr pq + rs then r s pr + rs qs + s2 dY = XdX + dXX . (3) 2 Matrix Jacobians (getting started) 2.1 Definition Let x ∈ Rn and y = y(x) ∈ Rn be a differentiable function of x. It is well known that the Jacobian matrix ∂y1 ∂y1 ∂x1 · · · ∂xn . ∂yi J = . . . = . . ∂xj i,j=1,2,...,n ∂yn ∂yn ∂x1 · · · ∂xn evaluated at a point x approximates y(x) by a linear function. Intuitively y(x + δx) ≈ y(x) + Jδx, i.e., J is the matrix that allows us to invoke perturbation theory. The function y may be viewed as performing a change of variables. Furthermore (intuitively) if a little box of n dimensional volume ǫ surrounds x, then it is transformed by y into a parallelopiped of volume det | | J ǫ around y(x). Therefore the Jacobian | | det J is the magnification factor for volumes. If we are integrating some function of y ∈ Rn as in p(y)dy, (where dy = dy1 . . . dyn), then when we change variables from y to x where y = y(x), then the integral becomes p(y(x))|det( ∂yi ) dx. For many ∂xj | people this becomes a matter of notation, but one should understand intuitively that the Jacobian tells you how volume elements scale. The determinant is 0 exactly where the change of variables breaks down. It is always instructive to see when this happens. Either there is no “x” locally for each “y” or there are many as in the example of polar coordinates at the origin. Notation: throughout this book, J denotes the Jacobian matrix. (Sometimes called the derivative or simply the Jacobian in the literature.) We will consistently write det J for the Jacobian determinant (unfortunately also called the Jacobian in the literature.) When we say Jacobian, we will be talking about both. 2.2 Simple Examples (n=2) We get our feet wet with some simple 2×2 examples. Every reader is familiar with changing scalar variables as in Z Z f(x) dx = f(y 2)(2y) dy . We want the reader to be just as comfortable when f is a scalar function of a matrix and we change X = Y 2: f(X)(dX) = f(Y 2)(Jacobian)(dY ). One can compute all of the 2 by 2 Jacobians that follow by hand, but in some cases it can be tedious and hard to get right on the first try. Code 8.1 in MATLAB takes away the drudgery and gives the right answer. Later we will learn fancy ways to get the answer without too much drudgery and also without the aid of a computer. We now consider the following examples: 2 × 2 Example 1: Matrix Square (Y = X2)
2 x 2 Example 2: Matrix Cube(Y=X3 2 X 2 Example 3: Matrix Inverse(Y=X) 2 X 2 Example 4: Linear Transformation (Y= AX+B) 2 X 2 Example 5: The LU Decomposition(X=LU) 2 X 2 Example 6: A Symmetric Decomposition (S= DMD) 2 X 2 Example 7: Traceless Symmetric = Polar Decomposition(S=QAQ, tr S=O) 2 X 2 Example 8: The Symmetric Eigenvalue Problem(S=QAQ) 2 X 2 Example 9: Symmetric Congruence(Y= ATSA) Discussion: Example 1: Matrix Square(Y=X2) With A≈/pq and Y=X2 the jacobian matrix of interest is 01aY1 +s aYr rq0 0Y12 On this first example we label the columns and rows so that the elements correspond to the definition J=(OX. Later we will omit the labels. We invite readers to compare with Equation (3). We see that the Jacobian matrix and the differential contain the same information. We can compute then det=4(p+s),(sp-gr)=4(tr X)2 det(X) Notice that breakdown occurs if x is singular or has trace zero Example 2: Matrix Cube(Y=x) With X p q andY=X p2+2 erp+sr p2+2qr+(p+s) rp+2sr p(p+s)+2 qr+8 pq+2qs pg +2 gs (p+s)+sr 2 qr+3s so that det J=9(sp-gr)(gr +p2+82+sp)2-9(det X)(tr X2+(tr X)2)2 Breakdown occurs if X is singular or if the eigenvalue ratio is a complex cube root of unit Example 3: Matrix Inverse (Y=X) and Y=X-I
" # " # 2 × 2 Example 2: Matrix Cube (Y = X3) 2 × 2 Example 3: Matrix Inverse (Y = X−1) 2 × 2 Example 4: Linear Transformation (Y = AX + B) 2 × 2 Example 5: The LU Decomposition (X = LU) 2 × 2 Example 6: A Symmetric Decomposition (S = DMD) 2 × 2 Example 7: Traceless Symmetric = Polar Decomposition (S = QΛQT , tr S = 0) 2 × 2 Example 8: The Symmetric Eigenvalue Problem (S = QΛQT ) 2 × 2 Example 9: Symmetric Congruence (Y = ATSA) Discussion: Example 1: Matrix Square (Y = X2) " # With X = p q and Y = X2 the Jacobian matrix of interest is r s ∂p ∂r ∂q ∂s 2p q r 0 ∂Y11 r p + s 0 r ∂Y21 J = q 0 p + s q ∂Y12 0 q r 2s ∂Y22 On this first example we label the columns and rows so that the elements correspond to the definition J = ∂Yij . Later we will omit the labels. We invite readers to compare with Equation (3). We see that ∂Xkl the Jacobian matrix and the differential contain the same information. We can compute then det J = 4(p + s) 2(sp − qr) = 4(tr X) 2 det(X). Notice that breakdown occurs if X is singular or has trace zero. Example 2: Matrix Cube (Y = X3) p q With X = and Y = X3 r s 3 p2 + 2 qr pq + q (p + s) 2 rp + sr qr 2 rp + sr p2 + 2 qr + (p + s) s r2 rp + 2 sr J = , 2 pq + qs q2 p (p + s) + 2 qr + s2 pq + 2 qs qr pq + 2 qs r (p + s) + sr 2 qr + 3 s2 so that 2 2 det J = 9(sp − qr) 2(qr + p + s + sp) 2 = 9 (det X) 2(tr X2 + (tr X) 2) 2 . 4 Breakdown occurs if X is singular or if the eigenvalue ratio is a complex cube root of unity. Example 3: Matrix Inverse (Y = X−1) p q With X = and Y = X−1 r s 2 −s qs sr −qr sr −ps 2 rp 1 −r J = , det X2 × qs −q2 −ps pq − 2 qr pq rp −p
so that det =(det x)-1 Breakdown occurs if X is singular Example 4: Linear Transformation(Y= AX +B) c do 0 00 The Jacobian matrix has two copies of the constant matrix A so that det J= det A2=(det A)2.Breakdown if a Example 5: The LU Decomposition(X=LU) The LU Decomposition computes a lower triangular L with ones on the diagonal and an upper triangular U such that X=LU For a general 2 x 2 matrix it takes the form which exists when p≠0. Notice the function of four variables might be written: r/p The Jacobian matrix is itself lower triangular 0010 so that det =1/p. Breakdown occurs when p=0 Example 6: A Symmetric Decomposition(S=DMD) Any symmetric matrix x= P r may be written as 0 X=DMD where D r/vps 1 Since X is symmetric, there are three independent variables, and thus the Jacobian matrix is 3x 3. The three independent elements in D and M may be thought of as functions of p, r, and s: namely y2
" # " # " # 6 " # so that det J = (det X) −4 . Breakdown occurs if X is singular. Example 4: Linear Transformation (Y = AX + B) a b 0 0 c d 0 0 J = 0 0 a b 0 0 c d The Jacobian matrix has two copies of the constant matrix A so that det J = det A2 = (det A)2 . Breakdown occurs if A is singular. Example 5: The LU Decomposition (X = LU) The LU Decomposition computes a lower triangular L with ones on the diagonal and an upper triangular U such that X = LU. For a general 2 × 2 matrix it takes the form p q 1 0 p q = r s which exists when p = 0. r p p ps−qr 1 0 Notice the function of four variables might be written: y1 = p y2 = r/p y3 = q y4 = (ps − qr)/p = det(X)/p The Jacobian matrix is itself lower triangular 1 0 0 0 p−1 0 0 r p2 , 0 0 1 0 − J = qr p2 so that det J = 1/p. Breakdown occurs when p = 0. Example 6: A Symmetric Decomposition (S = DMD) Any symmetric matrix X = p r may be written as r s √p 0 1 r/√ps X = DMD where D = . 0 √s r/√ps 1 Since X is symmetric, there are three independent variables, and thus the Jacobian matrix is 3 × 3. The three independent elements in D and M may be thought of as functions of p, r, and s: namely y1 = √p y2 = √s and y3 = r/√ps q p r p − − 1 and M =
The acobian matrix is 物-加 Breakdown occurs if p or s is 0 Example 7: Traceless Symmetric= Polar Decomposition(S=QAQ, trS=0) The reader will recall the usual definition of polar coordinates. If (p, s) are Cartesian coordinates, then the angle is 0= arctan(s/p) and the radius is r= vp2+s2. If we take a symmetric traceless 22 matrix and compute its eigenvalue and eigenvector decomposition, we find that the eigendecomposition is mathe- matically equivalent to the familiar transformation between Cartesian and polar coordinates. Indeed one of the eigenvectors of S is(cos o, sin o), where =0 /2. The Jacobian matrix is The Jacobian is the inverse of the radius. This corresponds to the familiar formula using the more usual notation drdy /r= drdo so that det =1/r. Breakdown occurs when r=0 Example 8: The Symmetric Eigenvalue Problem(S=QAQ) We compute the Jacobian for the general symmetric eigenvalue and eigenvector decomposition. Let ]-[=m1[2]如 We can compute the eigenvectors and eigenvalues of s directly in MATLAB and compute the Jacobian of the two eigenvalues and the eigenvector angle, but when we tried with the maple toolbox we found that the symbolic toolbox did not handle arctan"very well. Instead we found it easy to compute the Jacobian in the other direction We write S=QAQ, where Q is 2 x 2 orthogonal and A is diagonal. The Jacobian is 2 sin e cos 8(A1-A2) sin e cos 0(A1-A2) (cos20-sin20)(A1-A2) sin 0 cos 0 -sin 0 cos 0 det =A1-A2 Breakdown occurs if S is a multiple of the identity Example 9: Symmetric Congruence(Y=ASA
p " # " # The Jacobian matrix is 1 √p 0 0 1 √ s J = 0 0 1 2 √ r/p r/s 2 ps √ps √ − − ps so that 1 det J = 4ps Breakdown occurs if p or s is 0. Example 7: Traceless Symmetric = Polar Decomposition (S = QΛQT , tr S = 0) The reader will recall the usual definition of polar coordinates. If (p, s) are Cartesian coordinates, then the angle is θ = arctan(s/p) and the radius is r = p2 + s2 . If we take a symmetric traceless 2 × 2 matrix p s S = , s −p and compute its eigenvalue and eigenvector decomposition, we find that the eigendecomposition is mathematically equivalent to the familiar transformation between Cartesian and polar coordinates. Indeed one of the eigenvectors of S is (cos φ, sin φ), where φ = θ/2. The Jacobian matrix is √p2 p +s √p2 s +s 2 2 J = −s p p2+ 2 s2 p2+s The Jacobian is the inverse of the radius. This corresponds to the familiar formula using the more usual notation dxdy/r = drdθ so that det J = 1/r. Breakdown occurs when r = 0. Example 8: The Symmetric Eigenvalue Problem (S = QΛQT ) We compute the Jacobian for the general symmetric eigenvalue and eigenvector decomposition. Let p s cos θ − sin θ λ1 cos θ − sin θ T S = = . sin θ cos θ λ2 sin θ cos θ s r We can compute the eigenvectors and eigenvalues of S directly in MATLAB and compute the Jacobian of the two eigenvalues and the eigenvector angle, but when we tried with the Maple toolbox we found that the symbolic toolbox did not handle “arctan” very well. Instead we found it easy to compute the Jacobian in the other direction. We write S = QΛQT , where Q is 2 × 2 orthogonal and Λ is diagonal. The Jacobian is −2 sin θ cos θ (λ1 − λ2) cos2 θ sin 2 θ J = 2 sin θ cos θ (λ 1 − λ2) sin 2 θ cos2 θ (cos2 θ − sin 2 θ)(λ1 − λ2) sin θ cos θ − sin θ cos θ so that det J = λ1 − λ2 . Breakdown occurs if S is a multiple of the identity. Example 9: Symmetric Congruence (Y = ATSA)
Let Y= ASA, where Y and S are symmetric, but A b c d is arbitrary. The Jacobian matrix is J=b2 d22db cb+ and det =(det A)3 The cube on the determinant tends to surprise many people. Can you guess what it is for an n x n symmetric matrix(Y=ASA)? The answer(det =( det A)n+l)is in Example 3 of Section 3 7jacobian2by2m 7Code 8.1 of Random Eigenvalues by Alan Edelman EXperiment Compute the Jacobian of a 2x2 matrix function Symbolic tools are not perfect. The author exercised care in choosing the variables Lp ] A=La b;c d Compute Jacobians J=jacobian(Y(:),X(: )) JAC_square =factor(det(J)) Y=X-3; J=jacobian((: ), X( )) JAC_cube -factor(det(J)) Y=inv(X) J=jacobian(Y (:),X(: )) JAc_inv factor(det(s)) Y=A*X J=jacobian(Y(:), X(: )),JAC-linear = factor(det(J)) Y=Lp q;r/p det(X)/p]: J=jacobian(Y (:), X(: )) JAC-l1 -factor(det (j)) =Lp s r];y=Sqrt(p) sqrt(s) r/(sqrt(p)*sqrt(s))] J=jacobian(y, x) JAC DMD -factor(det(j)) x=Lp s]: y=[ sqrt(p"2+s"2) atan(s/p)] J=jacobian(y, x) JAC- notrace =factor(det()) Q=Icos(t) -sin(t): sin(t) cos(t)] D=[e10;0e2] y=[Y(1,1)Y(2,2)Y(1,2)];x[te1e2] J=jacobian(y, x) JAC-symeig =simplify(det (J)) X=[ Lp s; s r] Y=A,3*X*A y=[Y(1,1)Y(2,2)Y(1,2)];x=[prs] J=jacobian (y, x) JAC-symcong =factor(det()) Code 1
� � � � Let Y = ATSA, where Y and S are symmetric, but A = a b is arbitrary. The Jacobian matrix is c d 2 2 a c 2 ca J = b2 d2 2 db ab cd cb + ad and det J = (det A)3 . The cube on the determinant tends to surprise many people. Can you guess what it is for an n × n symmetric matrix (Y = ATSA)? The answer (det J = (det A)n+1)is in Example 3 of Section 3. %jacobian2by2.m %Code 8.1 of Random Eigenvalues by Alan Edelman %Experiment: Compute the Jacobian of a 2x2 matrix function %Comment: Symbolic tools are not perfect. The author % exercised care in choosing the variables. syms p q r s a b c d t e1 e2 X=[p q ; r s]; A=[a b;c d]; %% Compute Jacobians Y=X^2; J=jacobian(Y(:),X(:)), JAC_square =factor(det(J)) Y=X^3; J=jacobian(Y(:),X(:)), JAC_cube =factor(det(J)) Y=inv(X); J=jacobian(Y(:),X(:)), JAC_inv =factor(det(J)) Y=A*X; J=jacobian(Y(:),X(:)), JAC_linear =factor(det(J)) Y=[p q;r/p det(X)/p]; J=jacobian(Y(:),X(:)), JAC_lu =factor(det(J)) x=[p s r];y=[sqrt(p) sqrt(s) r/(sqrt(p)*sqrt(s))]; J=jacobian(y,x), JAC_DMD =factor(det(J)) x=[p s]; y=[ sqrt(p^2+s^2) atan(s/p)]; J=jacobian(y,x), JAC_notrace =factor(det(J)) Q=[cos(t) -sin(t); sin(t) cos(t)]; D=[e1 0;0 e2];Y=Q*D*Q.’; y=[Y(1,1) Y(2,2) Y(1,2)]; x=[t e1 e2]; J=jacobian(y,x), JAC_symeig =simplify(det(J)) X=[p s;s r]; Y=A.’*X*A; y=[Y(1,1) Y(2,2) Y(1,2)]; x=[p r s]; J=jacobian(y,x), JAC_symcong =factor(det(J)) Code 1
3 Y=BXA and the kronecker product 3.1 Jacobian of y= BXA(Kronecker Product Approach) There is a"nuts and bolts "approach to calculate some Jacobian determinants. A good example function the matrix inverse Y=X-. We recall from Example 4 of Section 1 that dY=-X-dXX-I In words, the perturbation dX is multiplied on the left and on the right by a fixed matrix. When this happens we are in a"Kronecker Product"situation, and can instantly write down the Jacobian We provide two definitions of the Kronecker Product for square matrices A E R", to BE R See 1 for a nice discussion of Kronecker products Operator Definition A⑧ B is the operator from X∈ Rm,n to y∈Rm, n where y=BXA. We write (A⑧B)X=BXA Matrix Definition(Tensor Product) A⑧ B is the matrix A8B= The following theorem is important for applications Theorem 1. det(A o B)=(det a)m(det B)" Application: If Y=X-I then dy=-(X-ToX-)dX so that I detJ=det XI Notational Note: The correspondence between the operator definition and matrix definition is worth spelling out. It corresponds to the following identity in MATLAB B Y(:)= kron(A, B)* X( i The second line does not change Y Here kron(A, B)is exactly the matrix in Equation(4), and X(: is the column vector consisting of the columns of X stacked on top of each other. (In computer science this is known as storing an array in "column major"order. Many authors write vec(X), where we use X(: ) Concretely, we have that vec(BXA)=(A@ B)vec(X) ⑧ B is as in(4) Proofs of theorem 8.1: Assume A and B are diagonalizable, with Au;= A;;(i=l,..., n) and Bu;=4;vi(i=l,..., m) Let Miy=v;u;. The mn matrices Mi, form a basis for Rmn and they are eigenmatrices of our map since BMij AT=HiA, Mig. The determinant is (det A)"(det B) The assumption of diagonalizability is not important
Y 3 Y = BXAT and the Kronecker Product 3.1 Jacobian of Y = BXAT (Kronecker Product Approach) There is a “nuts and bolts” approach to calculate some Jacobian determinants. A good example function is the matrix inverse Y = X−1 . We recall from Example 4 of Section 1 that dY = −X−1dXX−1 . In words, the perturbation dX is multiplied on the left and on the right by a fixed matrix. When this happens we are in a “Kronecker Product” situation, and can instantly write down the Jacobian. We provide two definitions of the Kronecker Product for square matrices A ∈ Rn,n to B ∈ Rm,m . See [1] for a nice discussion of Kronecker products. Operator Definition A ⊗ B is the operator from X ∈ Rm,n to Y ∈ Rm,n where Y = BXAT . We write (A ⊗ B)X = BXAT . Matrix Definition (Tensor Product) A ⊗ B is the matrix a11B . . . a1m2 B A ⊗ B = . . . . . . . (4) am11B . . . am1m2 B The following theorem is important for applications. Theorem 1. det(A ⊗ B) = (det A)m(det B)n Application: If Y = X−1 then dY = −(X−T ⊗ X−1) dX so that det J| = det X −2n | | | . Notational Note: The correspondence between the operator definition and matrix definition is worth spelling out. It corresponds to the following identity in MATLAB Y = B * X * A’ Y(:) = kron(A,B) * X(:) % The second line does not change Y Here kron(A,B) is exactly the matrix in Equation (4), and X(:) is the column vector consisting of the columns of X stacked on top of each other. (In computer science this is known as storing an array in “column major” order.) Many authors write vec(X), where we use X(:). Concretely, we have that vec(BXAT) = (A ⊗ B)vec(X) where A ⊗ B is as in (4). Proofs of Theorem 8.1: Assume A and B are diagonalizable, with Aui = λiui (i = 1, . . . , n) and Bvi = µivi (i = 1, . . . , m). Let Mij = T . The mn matrices Mij form a basis for Rmn v and they are eigenmatrices of our map since iuj BMijAT = µiλjMij . The determinant is µiλj or (det A) m(det B) n . (5) 1≤i≤n 1≤j≤m The assumption of diagonalizability is not important
Also one can directly manipulate the matrix since obviously A⑧B=(4⑧D)(I⑧B) as operators, and det I@B=(det B)" and det A o I which can be reshuffled into det I o A=(det A)m Other proofs may be obtained by working with the "LU"decomposition of A and B, the Svd of A and B, the QR decomposition, and many others Mathematical Note: That the operator Y= BXA can be expressed as a matrix is a co B(CIX,+C2X2)A= C1BX1A+C2BX2A ie.(A⑧B)(c1X1+c2X2)=c1(A⑧B)X1+c2(A⑧B)X2 It is important to realize that a linear transformation from Rm,n to R,n is defined by an element of Rmn, mn, i.e., by the m2n2 entries of an mn x mn matrix. The transformation defined by Kronecker products is an m-+ of this m2n2 Some Kronecker product properties 1.(A8B)=A8B 2.(A⑧B)-1=A-18B-1 3.det(A⑧B)=det(4)mdet(B)n,A∈Rn,n,B∈Rm,m 4.tr(A⑧B)=tr(Atr(B) 5. A@ B is orthogonal if A and B are orthogonal 6.(A⑧B)C⑧D)=(AC)⑧(BD 7. If Au= u, and Bu=Hu, then if X=vu, then BXA=AuX, and also AX"B=AuX.Therefore A8 B and B e A have the same eigenvalues, and transposed eigenvectors. It is easy to see that property 5 holds, since if A and B are orthogonal (AoB)=A8B=AB (A⑧B)-1 Linear Subspace Kronecker Products Some researchers consider the symmetric Kronecker product @ sym. In fact it has become clear that one might consider an anti-symmetric, upper-triangular or even a Toeplitz Kronecker product. We formulate Definition: Let S denote a linear subspace of Rmn and Ts a projection onto S. If AER, and B Rm then we define(A⑧B)sX=丌s(BXA)forX∈S Comments 1. If S is the set of symmetric matrices then m= n and (A 8 B)smx-BXA+AXBT 2. If S is the set of anti-sym matrices, then m= n (A)ant x= BXA+AXBT as well but this matrix is anti-sym
Also one can directly manipulate the matrix since obviously A ⊗ B = (A ⊗ I)(I ⊗ B) as operators, and det I ⊗ B = (det B)n and det A ⊗ I which can be reshuffled into det I ⊗ A = (det A)m . Other proofs may be obtained by working with the “LU” decomposition of A and B, the SVD of A and B, the QR decomposition, and many others. Mathematical Note: That the operator Y = BXAT can be expressed as a matrix is a consequence of linearity: B(c1X1 + c2X2)AT = c1BX1AT + c2BX2AT i.e. (A ⊗ B)(c1X1 + c2X2) = c1(A ⊗ B)X1 + c2(A ⊗ B)X2 It is important to realize that a linear transformation from Rm,n to Rm,n is defined by an element of Rmn,mn , i.e., by the m2n2 entries of an mn×mn matrix. The transformation defined by Kronecker products is an m2 + n2 subspace of this m2n2 dimensional space. Some Kronecker product properties: 1. (A ⊗ B)T = AT ⊗ BT 2. (A ⊗ B)−1 = A−1 ⊗ B−1 , A ∈ Rn,n , B ∈ Rm,m 3. det(A ⊗ B) = det(A)m det(B)n 4. tr(A ⊗ B) = tr(A)tr(B) 5. A ⊗ B is orthogonal if A and B are orthogonal 6. (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD) 7. If Au = λu, and Bv = µv, then if X = vuT , then BXAT = λµX, and also AXTBT = λµXT . Therefore A ⊗ B and B ⊗ A have the same eigenvalues, and transposed eigenvectors. It is easy to see that property 5 holds, since if A and B are orthogonal (A ⊗ B)T = AT ⊗ BT = A−1 ⊗ B−1 = (A ⊗ B)−1 . Linear Subspace Kronecker Products Some researchers consider the symmetric Kronecker product ⊗sym. In fact it has become clear that one might consider an anti-symmetric, upper-triangular or even a Toeplitz Kronecker product. We formulate a general notion: Definition: Let S denote a linear subspace of Rmn and πS a projection onto S. If A ∈ Rn,n and B ∈ Rm,m then we define (A ⊗ B)SX = πS (BXAT) for X ∈ S. Comments 1. If S is the set of symmetric matrices then m = n and BXAT + AXBT (A ⊗ B)symX = 2 2. If S is the set of anti-sym matrices, then m = n BXAT + AXBT (A ⊗ B)antiX = as well 2 but this matrix is anti-sym
3. If S is the set of upper triangular matrices, then m=n and(A8 B)upper upper(BXA) 4. We recall that Ts is a projection if Ts is linear and for all X E Rmn, TSXES (A⑧B)sym、BXA+AXB If b= a then by considering eigenmatrices E=uiu; for i < j 6. Jacobian of up Special case: A lower triangular, B upper triangular so that (A 8 B)upper X= bXA since BXA is upper triangular The eigenvalues of A and B are A;=Aii and u,=Bii respectively, where Au;=Aiu; and Bui=AiU (i=l,., n). The matrix Mii= viuf for i <j is upper triangular since v; and u, are zero below the ith and above the jth component respectively. (The eigenvectors of a triangular matrix are triangular. Since the Miy are a basis for upper triangular matrices BMiA=AiAj Mij. We then have detJ=Ⅱ4入=(A14…)吗-1-2…,pn) (A142413….Amn)(B1B21B332..Bn) Note that J is singular if an only if A or B is. 7. Jacobian of e Toeplitz Let X be a Toeplitz matrix. We can defin B)X=Toeplitz(BXA) where Toeplitz averages every diagonal 4 Jacobians of Linear Functions Powers and inverses The Jacobian of a linear map is just the determinant. This determinant is not always easily computed. The dimension of the underlying space of matrices plays a role. For example the Jacobian of Y= 2X is 2" for XERXn, 22 for upper triangular or symmetric X, 22 for antisymmetric X, and 2n for diagonal We will concentrate on general real matrices X and explore the symmetric case and case as well when appropriate
Y Y 3. If S is the set of upper triangular matrices, then m = n and (A ⊗ B)upper = upper(BXAT). 4. We recall that πS is a projection if π is linear and for all X ∈ Rmn S , πSX ∈ S. 5. Jacobian of ⊗sym: BXAT + AXBT (A ⊗ B)symX = 2 If B = A then det J = λiλj = (det A) n+1 i≤j T by considering eigenmatrices E = uiuj for i ≤ j. 6. Jacobian of ⊗upper: Special case: A lower triangular, B upper triangular so that (A ⊗ B)upperX = BXAT since BXAT is upper triangular. The eigenvalues of A and B are λi = Aii and µj = Bjj respectively, where Aui = λiui and Bvi = µivi T (i = 1, . . . , n). The matrix Mij = viuj for i ≤ j is upper triangular since vi and uj are zero below the ith and above the jth component respectively. (The eigenvectors of a triangular matrix are triangular.) j i M for i ≤ j . ij = Since the Mij are a basis for upper triangular matrices BMijAT = µiλjMij . We then have n n 2λ3 det J = µiλj = (λ1λ2 3 . . . λn)(µ1 µ n−1 µ n−2 . . . µn) 2 3 i≤j n = (A11A2 33 . . . Ann)(Bn 22A3 11Bn−1Bn−2 . . . Bnn). 22 33 Note that J is singular if an only if A or B is. 7. Jacobian of ⊗Toeplitz Let X be a Toeplitz matrix. We can define (A ⊗Toeplitz B)X = Toeplitz(BXAT) where Toeplitz averages every diagonal. 4 Jacobians of Linear Functions, Powers and Inverses The Jacobian of a linear map is just the determinant. This determinant is not always easily computed. The 2 dimension of the underlying space of matrices plays a role. For example the Jacobian of Y = 2X is 2n for n(n+1) n(n−1) X ∈ Rn×n, 2 for upper triangular or symmetric 2 X, 2 for antisymmetric 2 X, and 2n for diagonal X. We will concentrate on general real matrices X and explore the symmetric case and triangular case as well when appropriate