The Matrix Cookbook Kaare Brandt Petersen Michael Syskind Pedersen VERSION:JANUARY 5,2005 What is this?These pages are a collection of facts (identities,approxima- tions,inequalities,relations,...)about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference Disclaimer:The identities,approximations and relations presented here were obviously not invented but collected,borrowed and copied from a large amount of sources.These sources include similar but shorter notes found on the internet and appendices in books-see the references for a full list. Errors:Very likely there are errors,typos,and mistakes for which we apolo- gize and would be grateful to receive corrections at kbpaimm.dtu.dk. Its ongoing:The project of keeping a large repository of relations involving matrices is naturally ongoing and the version will be apparent from the date in the header. Suggestions:Your suggestion for additional content or elaboration of some topics is most welcome at kbpaimm.dtu.dk. Acknowledgements:We would like to thank the following for discussions, proofreading,extensive corrections and suggestions:Esben Hoegh-Rasmussen and Vasile Sima. Keywords:Matrix algebra,matrix relations,matrix identities,derivative of determinant,derivative of inverse matrix,differentiate a matrix. 1
The Matrix Cookbook Kaare Brandt Petersen Michael Syskind Pedersen Version: January 5, 2005 What is this? These pages are a collection of facts (identities, approximations, inequalities, relations, ...) about matrices and matters relating to them. It is collected in this form for the convenience of anyone who wants a quick desktop reference . Disclaimer: The identities, approximations and relations presented here were obviously not invented but collected, borrowed and copied from a large amount of sources. These sources include similar but shorter notes found on the internet and appendices in books - see the references for a full list. Errors: Very likely there are errors, typos, and mistakes for which we apologize and would be grateful to receive corrections at kbp@imm.dtu.dk. Its ongoing: The project of keeping a large repository of relations involving matrices is naturally ongoing and the version will be apparent from the date in the header. Suggestions: Your suggestion for additional content or elaboration of some topics is most welcome at kbp@imm.dtu.dk. Acknowledgements: We would like to thank the following for discussions, proofreading, extensive corrections and suggestions: Esben Hoegh-Rasmussen and Vasile Sima. Keywords: Matrix algebra, matrix relations, matrix identities, derivative of determinant, derivative of inverse matrix, differentiate a matrix. 1
CONTENTS CONTENTS Contents 1 Basics 1.1 Trace and Determinants l.2 The Special Case2x2..··.··············· 6 2 Derivatives 7 2.1 Derivatives of a Determinant.··················· 7 2.2 Derivatives of an Inverse······················ 8 2.3 Derivatives of Matrices,Vectors and Scalar Forms 2.4 Derivatives of Traces................... 11 2.5 Derivatives of Structured Matrices·.....···. 12 3 Inverses 14 3.1 Exact Relations... 14 3.2 Implication on Inverses.··············· 14 3.3 Approximations....·· 15 3.4 Generalized Inverse.... 15 3.5 Pseudo Inverse 15 4 Complex Matrices 17 4.1 Complex Derivatives.··.. 17 5 Decompositions 20 5.1 Eigenvalues and Eigenvectors····. 20 5.2 Singular Value Decomposition.....·.·.... 20 5.3 Triangular Decomposition.············: ”· 6 General Statistics and Probability 22 6.1 Moments of any distribution 22 62 Expectations·························· 2 7 Gaussians 24 7.1 One Dimensional....·········· 24 7.3 Moments 27 7.4 Miscellaneous.·················· 2 7.5 One Dimensional Mixture of Gaussians........ 29 7.6 Mixture of Gaussians 。。。。。。。。。。。。。。。 30 8 Miscellaneous 31 8.1 Functions and Series......... 4 1 8.2 Indices,Entries and Vectors.·.··.·. 8.3 Solutions to Systems of Equations.... 35 8.4 Block matrices..·.·...··.···· 8.5 Matrix Norms.······················· 37 8.6 Positive Definite and Semi-definite Matrices............ 38 PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 2
CONTENTS CONTENTS Contents 1 Basics 5 1.1 Trace and Determinants . . . . . . . . . . . . . . . . . . . . . . . 5 1.2 The Special Case 2x2 . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Derivatives 7 2.1 Derivatives of a Determinant . . . . . . . . . . . . . . . . . . . . 7 2.2 Derivatives of an Inverse . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Derivatives of Matrices, Vectors and Scalar Forms . . . . . . . . 9 2.4 Derivatives of Traces . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.5 Derivatives of Structured Matrices . . . . . . . . . . . . . . . . . 12 3 Inverses 14 3.1 Exact Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 Implication on Inverses . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4 Generalized Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Pseudo Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Complex Matrices 17 4.1 Complex Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 17 5 Decompositions 20 5.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . 20 5.2 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 20 5.3 Triangular Decomposition . . . . . . . . . . . . . . . . . . . . . . 21 6 General Statistics and Probability 22 6.1 Moments of any distribution . . . . . . . . . . . . . . . . . . . . . 22 6.2 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 7 Gaussians 24 7.1 One Dimensional . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.2 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 7.4 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7.5 One Dimensional Mixture of Gaussians . . . . . . . . . . . . . . . 29 7.6 Mixture of Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . 30 8 Miscellaneous 31 8.1 Functions and Series . . . . . . . . . . . . . . . . . . . . . . . . . 31 8.2 Indices, Entries and Vectors . . . . . . . . . . . . . . . . . . . . . 32 8.3 Solutions to Systems of Equations . . . . . . . . . . . . . . . . . 35 8.4 Block matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 8.5 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 8.6 Positive Definite and Semi-definite Matrices . . . . . . . . . . . . 38 Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 2
CONTENTS CONTENTS 8.7 Integral Involving Dirac Delta Functions:··...:······. 39 8.8 Miscellaneous..···.···.:················· 40 A Proofs and Details 41 PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 3
CONTENTS CONTENTS 8.7 Integral Involving Dirac Delta Functions . . . . . . . . . . . . . . 39 8.8 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 A Proofs and Details 41 Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 3
CONTENTS CONTENTS Notation and Nomenclature A Matrix Matrix indexed for some purpose Ai Matrix indexed for some purpose A可 Matrix indexed for some purpose A Matrix indexed for some purpose or The n.th power of a square matrix A-1 The inverse matrix of the matrix A A+ The pseudo inverse matrix of the matrix A A1/2 The square root of a matrix (if unique),not elementwise (A) The (i,j).th entry of the matrix A A The (i,j).th entry of the matrix A a Vector 男 Vector indexed for some purpose ai The i.th element of the vector a a Scalar 咒x Real part of a scalar z Real part of a vector Z Real part of a matrix Bz Imaginary part of a scalar Bz Imaginary part of a vector SZ Imaginary part of a matrix det(A) Determinant of A A Matrix norm (subscript if any denotes what norm) Transposed matrix A* Complex conjugated matrix AH Transposed and complex conjugated matrix AoB Hadamard (elementwise)product A⑧B Kronecker product 0 The null matrix.Zero in all entries. The identity matrix J的 The single-entry matrix,1 at (i,j)and zero elsewhere A positive definite matrix A diagonal matrix PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 4
CONTENTS CONTENTS Notation and Nomenclature A Matrix Aij Matrix indexed for some purpose Ai Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A−1 The inverse matrix of the matrix A A+ The pseudo inverse matrix of the matrix A A1/2 The square root of a matrix (if unique), not elementwise (A)ij The (i, j).th entry of the matrix A Aij The (i, j).th entry of the matrix A a Vector ai Vector indexed for some purpose ai The i.th element of the vector a a Scalar <z Real part of a scalar <z Real part of a vector <Z Real part of a matrix =z Imaginary part of a scalar =z Imaginary part of a vector =Z Imaginary part of a matrix det(A) Determinant of A ||A|| Matrix norm (subscript if any denotes what norm) AT Transposed matrix A∗ Complex conjugated matrix AH Transposed and complex conjugated matrix A ◦ B Hadamard (elementwise) product A ⊗ B Kronecker product 0 The null matrix. Zero in all entries. I The identity matrix J ij The single-entry matrix, 1 at (i, j) and zero elsewhere Σ A positive definite matrix Λ A diagonal matrix Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 4
1 BASICS 1 Basics (AB)-1=B-1A-1 (ABC)-1=C-1B-1A-1 (AT)-1=(A-1)T (A+B)T=AT+BT (AB)T=BTAT (ABC..T=...CTBTAT (A)1=(A-1)H (A+B)H=AH+B (AB)H=BHAH (ABC...)=.CHBA 1.1 Trace and Determinants Tr(A)=∑AH=∑A, A:=eig(A) Tr(A)=Tr(AT) Tr(AB)=Tr(BA) Tr(A+B)=Tr(A)+Tr(B) Tr(ABC)=Tr(BCA)=Tr(CAB) det(a)=Πx: Ai=eig(A) det(AB)=det(A)det(B), if A and B are invertible det(A-1)=det(A) 1 PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 5
1 BASICS 1 Basics (AB) −1 = B −1A−1 (ABC...) −1 = ...C−1B −1A−1 (AT ) −1 = (A−1 ) T (A + B) T = AT + B T (AB) T = B T AT (ABC...) T = ...CT B T AT (AH) −1 = (A−1 ) H (A + B) H = AH + B H (AB) H = B HAH (ABC...) H = ...CHB HAH 1.1 Trace and Determinants Tr(A) = X i Aii = X i λi , λi = eig(A) Tr(A) = Tr(AT ) Tr(AB) = Tr(BA) Tr(A + B) = Tr(A) + Tr(B) Tr(ABC) = Tr(BCA) = Tr(CAB) det(A) = Y i λi λi = eig(A) det(AB) = det(A) det(B), if A and B are invertible det(A−1 ) = 1 det(A) Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 5
1.2 The Special Case 2x2 1 BASICS 1.2 The Special Case 2x2 Consider the matrix A A= A11A12] A21A22 Determinant and trace det(A)=A11A22-A12A21 Tr(A)=A11+A22 Eigenvalues X2-λ.Tr(A)+det(A)=0 Tr(A)+Tr(A)2-4det(A) 2 =Tr(A)-Tr(A)-4det(A) 入1+2=T(A) 1λ2=det(A) Eigenvectors Inverse A-(A) 1「A22-A12 -A21A11 PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 6
1.2 The Special Case 2x2 1 BASICS 1.2 The Special Case 2x2 Consider the matrix A A = · A11 A12 A21 A22 ¸ Determinant and trace det(A) = A11A22 − A12A21 Tr(A) = A11 + A22 Eigenvalues λ 2 − λ · Tr(A) + det(A) = 0 λ1 = Tr(A) + p Tr(A) 2 − 4 det(A) 2 λ2 = Tr(A) − p Tr(A) 2 − 4 det(A) 2 λ1 + λ2 = Tr(A) λ1λ2 = det(A) Eigenvectors v1 ∝ · A12 λ1 − A11 ¸ v2 ∝ · A12 λ2 − A11 ¸ Inverse A−1 = 1 det(A) · A22 −A12 −A21 A11 ¸ Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 6
2 DERIVATIVES 2 Derivatives This section is covering differentiation of a number of expressions with respect to a matrix X.Note that it is always assumed that X has no special structure,i.e. that the elements of X are independent(e.g.not symmetric,Toeplitz,positive definite).See section 2.5 for differentiation of structured matrices.The basic assumptions can be written in a formula as 0XL二6kδ5 OXij that is for e.g. vector forms, 0x对 Ox Ox Oxi ∂ y]材 Dyi The following rules are general and very useful when deriving the differential of an expression ([10]): OA = 0 (A is a constant) (1) 0(ax) a0X ( 0(X+Y) = 0X+8Y 3) a(Tr(X)) Tr(ax) a(xY) (Ox)Y+X(aY) 5) a(XoY) (0x)oY+Xo(OY) (6) a(X⑧Y) (aX)⑧Y+X⑧(aY) ∂(X-1) -X-1(8X)X-1 (8) a(det(X)) det(X)Tr(X-1aX) (9) (In(det(X))) Tr(X-0X) (10) 0X1 (ox)T (11) oXH (8x)H (12) 2.1 Derivatives of a Determinant 2.1.1 General form adet(Y)=det(Y)Tr Y-10Y 2.1.2 Linear forms det(X)=det(X)(X-)T OX Odet(AXB)=det(AXB)(X-1)T=det(AXB)(XT)-1 ∂X PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 7
2 DERIVATIVES 2 Derivatives This section is covering differentiation of a number of expressions with respect to a matrix X. Note that it is always assumed that X has no special structure, i.e. that the elements of X are independent (e.g. not symmetric, Toeplitz, positive definite). See section 2.5 for differentiation of structured matrices. The basic assumptions can be written in a formula as ∂Xkl ∂Xij = δikδlj that is for e.g. vector forms, · ∂x ∂y ¸ i = ∂xi ∂y · ∂x ∂y ¸ i = ∂x ∂yi · ∂x ∂y ¸ ij = ∂xi ∂yj The following rules are general and very useful when deriving the differential of an expression ([10]): ∂A = 0 (A is a constant) (1) ∂(αX) = α∂X (2) ∂(X + Y) = ∂X + ∂Y (3) ∂(Tr(X)) = Tr(∂X) (4) ∂(XY) = (∂X)Y + X(∂Y) (5) ∂(X ◦ Y) = (∂X) ◦ Y + X ◦ (∂Y) (6) ∂(X ⊗ Y) = (∂X) ⊗ Y + X ⊗ (∂Y) (7) ∂(X −1 ) = −X −1 (∂X)X −1 (8) ∂(det(X)) = det(X)Tr(X −1 ∂X) (9) ∂(ln(det(X))) = Tr(X −1 ∂X) (10) ∂X T = (∂X) T (11) ∂X H = (∂X) H (12) 2.1 Derivatives of a Determinant 2.1.1 General form ∂ det(Y) ∂x = det(Y)Tr · Y−1 ∂Y ∂x ¸ 2.1.2 Linear forms ∂ det(X) ∂X = det(X)(X−1 ) T ∂ det(AXB) ∂X = det(AXB)(X−1 ) T = det(AXB)(XT ) −1 Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 7
2.2 Derivatives of an Inverse 2 DERIVATIVES 2.1.3 Square forms If X is square and invertible,then Odet(XTAX)=2det(XTAX)X-T ∂X If X is not square but A is symmetric,then Odet(XTAX) =2det(XTAX)AX(XTAX)-1 OX If X is not square and A is not symmetric,then 0det(XTAX) =det(XTAX)(AX(XTAX)-1+ATX(XTATX)-1) (13) OX 2.1.4 Other nonlinear forms Some special cases are(See [8]) 0In det(XTX)I=2(X+)T aX 0lndet(XTX)=_2xT ∂X+ alnl det(X)I=(x-1)T =(xT)-1 0X Odet(X)=kdet(Xk)X-T OX See [7]. 2.2 Derivatives of an Inverse From [15]we have the basic identity Y-1 =-Y-1yy-1 Ox 8x from which it follows 0X-1)起=-(X-k:(X-) OXij 0aTX-1b =-X-TabTX-T aX 8det(X-1) =-det(X-1)(X-1)T OX OTr(AX-B) aX =-(X-1BAX-1)T PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 8
2.2 Derivatives of an Inverse 2 DERIVATIVES 2.1.3 Square forms If X is square and invertible, then ∂ det(XT AX) ∂X = 2 det(XT AX)X−T If X is not square but A is symmetric, then ∂ det(XT AX) ∂X = 2 det(XT AX)AX(XT AX) −1 If X is not square and A is not symmetric, then ∂ det(XT AX) ∂X = det(XT AX)(AX(XT AX) −1 + AT X(XT AT X) −1 ) (13) 2.1.4 Other nonlinear forms Some special cases are (See [8]) ∂ ln det(XT X)| ∂X = 2(X+) T ∂ ln det(XT X) ∂X+ = −2XT ∂ ln | det(X)| ∂X = (X−1 ) T = (XT ) −1 ∂ det(Xk ) ∂X = k det(Xk )X−T See [7]. 2.2 Derivatives of an Inverse From [15] we have the basic identity ∂Y−1 ∂x = −Y−1 ∂Y ∂x Y−1 from which it follows ∂(X−1 )kl ∂Xij = −(X−1 )ki(X−1 )jl ∂a T X−1b ∂X = −X−T abT X−T ∂ det(X−1 ) ∂X = − det(X−1 )(X−1 ) T ∂Tr(AX−1B) ∂X = −(X−1BAX−1 ) T Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 8
2.3 Derivatives of Matrices,Vectors and Scalar Forms 2 DERIVATIVES 2.3 Derivatives of Matrices,Vectors and Scalar Forms 2.3.1 First Order OxTb ObTx 0x-0x .=b OaTXb =abT aX OaTXTb =baT 0X DaXaOaTXTa=aat ax OX 0x=J的 OXij 8XA五=im(A)n=(JmnA)H 0Xmn aXTA五=in(Am=(JmmA为 Xmn 2.3.2 Second Order 9∑XuXmn=2∑Xu Xij klmn kl ObTXTXc=X(beT+cbT) OX @(Bx+b)TC(Dx+d)=BTC(Dx+d)+DTCT(Bx+b) Ox (XTBX)=5u(XTB)+(BX)a 0X订 0(XTBX)XTBJ+JBX (J)M-6u6t 0X订 See Sec 8.2 for useful properties of the Single-entry matrix Jij 0xTBx=(B+BT)x 8x abTXTDXe =DTXbeT+DXcbT aX JX(Xb+c)TD(Xb+c)=(D+DT)(Xb+c)bT PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 9
2.3 Derivatives of Matrices, Vectors and Scalar Forms 2 DERIVATIVES 2.3 Derivatives of Matrices, Vectors and Scalar Forms 2.3.1 First Order ∂x T b ∂x = ∂b T x ∂x = b ∂a T Xb ∂X = abT ∂a T XT b ∂X = baT ∂a T Xa ∂X = ∂a T XT a ∂X = aaT ∂X ∂Xij = J ij ∂(XA)ij ∂Xmn = δim(A)nj = (J mnA)ij ∂(XT A)ij ∂Xmn = δin(A)mj = (J nmA)ij 2.3.2 Second Order ∂ ∂Xij X klmn XklXmn = 2X kl Xkl ∂b T XT Xc ∂X = X(bcT + cbT ) ∂(Bx + b) T C(Dx + d) ∂x = B T C(Dx + d) + DT CT (Bx + b) ∂(XT BX)kl ∂Xij = δlj (XT B)ki + δkj (BX)il ∂(XT BX) ∂Xij = XT BJij + J jiBX (J ij )kl = δikδjl See Sec 8.2 for useful properties of the Single-entry matrix J ij ∂x T Bx ∂x = (B + B T )x ∂b T XT DXc ∂X = DT XbcT + DXcbT ∂ ∂X (Xb + c) T D(Xb + c) = (D + DT )(Xb + c)b T Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 9
2.3 Derivatives of Matrices,Vectors and Scalar Forms 2 DERIVATIVES Assume W is symmetric,then 0 s(x-As)TW(x-As)=-2ATW(x-As) x(x-As)TW(x-As)--2W(x-As) 8 A(x-As)TW(x-As)=-2W(x-As)sT 2.3.3 Higher order and non-linear a'x- - (x'yPab2(x-1-r (14) =0 Rx9rxh-x1axrx +(X)Tx"abT(X"-1-)T (15) See A.0.1 for a proof. Assume s and r are functions of x,i.e.s=s(x),r =r(x),and that A is a constant,then 灰 (A+AT)s 0s]T T L As+ 2.3.4 Gradient and Hessian Using the above we have for the gradient and the hessian f=xTAx+bTx af 了xf=x =(A+AT)x+b 82f OxOxT=A+AT PETERSEN PEDERSEN,THE MATRIX COOKBOOK (VERSION:JANUARY 5,2005),PAGE 10
2.3 Derivatives of Matrices, Vectors and Scalar Forms 2 DERIVATIVES Assume W is symmetric, then ∂ ∂s (x − As) TW(x − As) = −2ATW(x − As) ∂ ∂x (x − As) TW(x − As) = −2W(x − As) ∂ ∂A (x − As) TW(x − As) = −2W(x − As)s T 2.3.3 Higher order and non-linear ∂ ∂X a T Xnb = nX−1 r=0 (Xr ) T abT (Xn−1−r ) T (14) ∂ ∂X a T (Xn ) T Xnb = nX−1 r=0 h Xn−1−rabT (Xn ) T Xr +(Xr ) T XnabT (Xn−1−r ) T i (15) See A.0.1 for a proof. Assume s and r are functions of x, i.e. s = s(x), r = r(x), and that A is a constant, then ∂ ∂x s T As = · ∂s ∂x ¸T (A + AT )s ∂ ∂x s T Ar = · ∂s ∂x ¸T As + · ∂r ∂x ¸T AT r 2.3.4 Gradient and Hessian Using the above we have for the gradient and the hessian f = x T Ax + b T x ∇xf = ∂f ∂x = (A + AT )x + b ∂ 2f ∂x∂xT = A + AT Petersen & Pedersen, The Matrix Cookbook (Version: January 5, 2005), Page 10