《机器学习与知识发现》教学资源：Linear Algebra Review and Reference.pdf_大学文库

Linear Algebra Review and Reference Zico Kolter (updated by Chuong Do) October 7, 2008 Contents 1 Basic Concepts and Notation 2 1.1 Basic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Matrix Multiplication 3 2.1 Vector-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Matrix-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Matrix-Matrix Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Operations and Properties 7 3.1 The Identity Matrix and Diagonal Matrices . . . . . . . . . . . . . . . . . . 8 3.2 The Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4 The Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.6 Linear Independence and Rank . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.7 The Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.8 Orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.9 Range and Nullspace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . 12 3.10 The Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.11 Quadratic Forms and Positive Semidefinite Matrices . . . . . . . . . . . . . . 17 3.12 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.13 Eigenvalues and Eigenvectors of Symmetric Matrices . . . . . . . . . . . . . 19 4 Matrix Calculus 20 4.1 The Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2 The Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Gradients and Hessians of Quadratic and Linear Functions . . . . . . . . . . 23 4.4 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.5 Gradients of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.6 Eigenvalues as Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1

entry of C is equal to the inner product of the ith row of A and the jth row of B. Symbolically, this looks like the following, C = AB =      — a T 1 — — a T 2 — . . . — a T m —        | | | b1 b2 · · · bp | | |   =      a T 1 b1 a T 1 b2 · · · a T 1 bp a T 2 b1 a T 2 b2 · · · a T 2 bp . . . . . . . . . . . . a T mb1 a T mb2 · · · a T mbp      . Remember that since A ∈ R m×n and B ∈ R n×p , ai ∈ R n and bj ∈ R n , so these inner products all make sense. This is the most “natural” representation when we represent A by rows and B by columns. Alternatively, we can represent A by columns, and B by rows. This representation leads to a much trickier interpretation of AB as a sum of outer products. Symbolically, C = AB =   | | | a1 a2 · · · an | | |        — b T 1 — — b T 2 — . . . — b T n —      = Xn i=1 aib T i . Put another way, AB is equal to the sum, over all i, of the outer product of the ith column of A and the ith row of B. Since, in this case, ai ∈ R m and bi ∈ R p , the dimension of the outer product aib T i is m × p, which coincides with the dimension of C. Chances are, the last equality above may appear confusing to you. If so, take the time to check it for yourself! Second, we can also view matrix-matrix multiplication as a set of matrix-vector products. Specifically, if we represent B by columns, we can view the columns of C as matrix-vector products between A and the columns of B. Symbolically, C = AB = A   | | | b1 b2 · · · bp | | |   =   | | | Ab1 Ab2 · · · Abp | | |  . Here the ith column of C is given by the matrix-vector product with the vector on the right, ci = Abi . These matrix-vector products can in turn be interpreted using both viewpoints given in the previous subsection. Finally, we have the analogous viewpoint, where we represent A by rows, and view the rows of C as the matrix-vector product between the rows of A and C. Symbolically, C = AB =      — a T 1 — — a T 2 — . . . — a T m —      B =      — a T 1 B — — a T 2 B — . . . — a T mB —      . Here the ith row of C is given by the matrix-vector product with the vector on the left, c T i = a T i B. 6

It may seem like overkill to dissect matrix multiplication to such a large degree,especially when all these viewpoints follow immediately from the initial definition we gave (in about a line of math)at the beginning of this section.However,virtually all of linear algebra deals with matrix multiplications of some kind,and it is worthwhile to spend some time trying to develop an intuitive understanding of the viewpoints presented here. In addition to this,it is useful to know a few basic properties of matrix multiplication at a higher level: Matrix multiplication is associative:(AB)C=A(BC). Matrix multiplication is distributive:A(B+C)=AB+AC. Matrix multiplication is,in general,not commutative;that is,it can be the case that AB≠BA.(For example,ifA∈Rmxn and B∈Rnxg,the matrix product BA does not even exist if m and g are not equal!) If you are not familiar with these properties,take the time to verify them for yourself. For example,to check the associativity of matrix multiplication,suppose that A Rmxn B∈Rnxp,andC∈Rpxg.Note that AB∈Rmxp,so(AB)C∈Rmxg.Similarly,BC∈Rnxg, so A(BC)E Rmxa.Thus,the dimensions of the resulting matrices agree.To show that matrix multiplication is associative,it suffices to check that the (i,j)th entry of(AB)C is equal to the (i,j)th entry of A(BC).We can verify this directly using the definition of matrix multiplication: (AB)C)5= Ck 1 若 ( (BC)-(A(BC) Here,the first and last two equalities simply use the definition of matrix multiplication,the third and fifth equalities use the distributive property for scalar multiplication over addition, and the fourth equality uses the commutative and associativity of scalar addition.This technique for proving matrix properties by reduction to simple scalar properties will come up often,so make sure you're familiar with it. 3 Operations and Properties In this section we present several operations and properties of matrices and vectors.Hope- fully a great deal of this will be review for you,so the notes can just serve as a reference for these topics. 7

It may seem like overkill to dissect matrix multiplication to such a large degree, especially when all these viewpoints follow immediately from the initial definition we gave (in about a line of math) at the beginning of this section. However, virtually all of linear algebra deals with matrix multiplications of some kind, and it is worthwhile to spend some time trying to develop an intuitive understanding of the viewpoints presented here. In addition to this, it is useful to know a few basic properties of matrix multiplication at a higher level: • Matrix multiplication is associative: (AB)C = A(BC). • Matrix multiplication is distributive: A(B + C) = AB + AC. • Matrix multiplication is, in general, not commutative; that is, it can be the case that AB 6= BA. (For example, if A ∈ R m×n and B ∈ R n×q , the matrix product BA does not even exist if m and q are not equal!) If you are not familiar with these properties, take the time to verify them for yourself. For example, to check the associativity of matrix multiplication, suppose that A ∈ R m×n , B ∈ R n×p , and C ∈ R p×q . Note that AB ∈ R m×p , so (AB)C ∈ R m×q . Similarly, BC ∈ R n×q , so A(BC) ∈ R m×q . Thus, the dimensions of the resulting matrices agree. To show that matrix multiplication is associative, it suffices to check that the (i, j)th entry of (AB)C is equal to the (i, j)th entry of A(BC). We can verify this directly using the definition of matrix multiplication: ((AB)C)ij = X p k=1 (AB)ikCkj = X p k=1 Xn l=1 AilBlk! Ckj = X p k=1 Xn l=1 AilBlkCkj! = Xn l=1 X p k=1 AilBlkCkj! = Xn l=1 Ail Xn k=p BlkCkj! = Xn l=1 Ail(BC)lj = (A(BC))ij . Here, the first and last two equalities simply use the definition of matrix multiplication, the third and fifth equalities use the distributive property for scalar multiplication over addition, and the fourth equality uses the commutative and associativity of scalar addition. This technique for proving matrix properties by reduction to simple scalar properties will come up often, so make sure you’re familiar with it. 3 Operations and Properties In this section we present several operations and properties of matrices and vectors. Hopefully a great deal of this will be review for you, so the notes can just serve as a reference for these topics. 7