5 where the subscript denotes the sam_中国高校课件下载中心

点击下载：北京大学：《模式识别》课程教学资源（参考资料）A Tutorial on Principal Component Analysis

正在加载图片...

where the subscript denotes the sample number.The variance Consider the matrix Cx=IXXT.The ijth element of Cx of A and B are individually defined as. is the dot product between the vector of theth measurement type with the vector of the jh measurement type.We can =1∑a，哈=∑ summarize several properties of Cx: 11 The covariance between A and B is a straight-forward gener- .Cx is a square symmetric m x m matrix (Theorem 2 of alization. Appendix A) The diagonal terms of Cx are the variance of particular covariance of A and B=AB=-aibi measurement types. The off-diagonal terms of Cx are the covariance be- The covariance measures the degree of the linear relationship tween measurement types. between two variables.A large positive value indicates pos- itively correlated data.Likewise,a large negative value de- notes negatively correlated data.The absolute magnitude of Cx captures the covariance between all possible pairs of mea- surements.The covariance values reflect the noise and redun- the covariance measures the degree of redundancy.Some ad- ditional facts about the covariance. dancy in our measurements. .OAB is zero if and only if A and B are uncorrelated (e.g. In the diagonal terms,by assumption,large values cor- respond to interesting structure. Figure 2,left panel). ·OB=oifA=B. In the off-diagonal terms large magnitudes correspond to high redundancy. We can equivalently convert A and B into corresponding row Pretend we have the option of manipulating Cx.We will sug- vectors. gestively define our manipulated covariance matrix Cy.What a [al a2...an] features do we want to optimize in Cy? b =[b1 b2 ..bn] so that we may express the covariance as a dot product matrix D.Diagonalize the Covariance Matrix computation. We can summarize the last two sections by stating that our oab≡三-ab (2) goals are(1)to minimize redundancy,measured by the mag- nitude of the covariance,and (2)maximize the signal,mea- Finally,we can generalize from two vectors to an arbitrary sured by the variance.What would the optimized covariance matrix Cy look like? number.Rename the row vectors a and b as x and x2,respec- tively,and consider additional indexed row vectors X3,...,Xm Define a new m x n matrix X .All off-diagonal terms in Cy should be zero.Thus,Cy must be a diagonal matrix.Or,said another way,Y is decorrelated. Each successive dimension in Y should be rank-ordered according to variance. One interpretation of X is the following.Each row of X corre- There are many methods for diagonalizing Cy.It is curious to sponds to all measurements of a particular type.Each column note that PCA arguably selects the easiest method:PCA as- of X corresponds to a set of measurements from one particular trial (this is X from section 3.1).We now arrive at a definition sumes that all basis vectors [p,...,Pm}are orthonormal,i.e. P is an orthonormal matrix.Why is this assumption easiest? for the covariance matrix Cx. Envision how PCA works.In our simple example in Figure 2, Cx≡-XXT P acts as a generalized rotation to align a basis with the axis n of maximal variance.In multiple dimensions this could be performed by a simple algorithm: 2 Note that in practice,the covariance is calculated asThe 1.Select a normalized direction in m-dimensional space slight change in normalization constant arises from estimation theory.but along which the variance in X is maximized.Save this that is beyond the scope of this tutorial. vector as p1.5 where the subscript denotes the sample number. The variance of A and B are individually defined as, σ 2 A = 1 n ∑ i a 2 i , σ 2 B = 1 n ∑ i b 2 i The covariance between A and B is a straight-forward generalization. covariance o f A and B ≡ σ 2 AB = 1 n ∑ i aibi The covariance measures the degree of the linear relationship between two variables. A large positive value indicates positively correlated data. Likewise, a large negative value denotes negatively correlated data. The absolute magnitude of the covariance measures the degree of redundancy. Some additional facts about the covariance. • σAB is zero if and only if A and B are uncorrelated (e.g. Figure 2, left panel). • σ 2 AB = σ 2 A if A = B. We can equivalently convert A and B into corresponding row vectors. a = [a1 a2 ... an] b = [b1 b2 ... bn] so that we may express the covariance as a dot product matrix computation.2 σ 2 ab ≡ 1 n abT (2) Finally, we can generalize from two vectors to an arbitrary number. Rename the row vectors a and b as x1 and x2, respectively, and consider additional indexed row vectors x3,...,xm. Define a new m×n matrix X. X =    x1 . . . xm    One interpretation of X is the following. Each row of X corresponds to all measurements of a particular type. Each column of X corresponds to a set of measurements from one particular trial (this is ~X from section 3.1). We now arrive at a definition for the covariance matrix CX. CX ≡ 1 n XXT . 2 Note that in practice, the covariance σ 2 AB is calculated as 1 n−1 ∑i aibi . The slight change in normalization constant arises from estimation theory, but that is beyond the scope of this tutorial. Consider the matrix CX = 1 nXXT . The i jth element of CX is the dot product between the vector of the i th measurement type with the vector of the j th measurement type. We can summarize several properties of CX: • CX is a square symmetric m×m matrix (Theorem 2 of Appendix A) • The diagonal terms of CX are the variance of particular measurement types. • The off-diagonal terms of CX are the covariance between measurement types. CX captures the covariance between all possible pairs of measurements. The covariance values reflect the noise and redundancy in our measurements. • In the diagonal terms, by assumption, large values correspond to interesting structure. • In the off-diagonal terms large magnitudes correspond to high redundancy. Pretend we have the option of manipulating CX. We will suggestively define our manipulated covariance matrix CY. What features do we want to optimize in CY? D. Diagonalize the Covariance Matrix We can summarize the last two sections by stating that our goals are (1) to minimize redundancy, measured by the magnitude of the covariance, and (2) maximize the signal, measured by the variance. What would the optimized covariance matrix CY look like? • All off-diagonal terms in CY should be zero. Thus, CY must be a diagonal matrix. Or, said another way, Y is decorrelated. • Each successive dimension in Y should be rank-ordered according to variance. There are many methods for diagonalizing CY. It is curious to note that PCA arguably selects the easiest method: PCA assumes that all basis vectors {p1,...,pm} are orthonormal, i.e. P is an orthonormal matrix. Why is this assumption easiest? Envision how PCA works. In our simple example in Figure 2, P acts as a generalized rotation to align a basis with the axis of maximal variance. In multiple dimensions this could be performed by a simple algorithm: 1. Select a normalized direction in m-dimensional space along which the variance in X is maximized. Save this vector as p1

<<向上翻页向下翻页>>

点击下载：北京大学：《模式识别》课程教学资源（参考资料）A Tutorial on Principal Component Analysis