正在加载图片...
6 2.Find another direction along which variance is maxi- V.SOLVING PCA USING EIGENVECTOR DECOMPOSITION mized,however,because of the orthonormality condi- tion,restrict the search to all directions orthogonal to We derive our first algebraic solution to PCA based on an im- all previous selected directions.Save this vector as pi portant property of eigenvector decomposition.Once again 3.Repeat this procedure until m vectors are selected the data set is X,an m x n matrix,where m is the number of measurement types and n is the number of samples.The goal is summarized as follows The resulting ordered set of p's are the principal components. In principle this simple algorithm works,however that would Find some orthonormal matrix P in Y=PX such bely the true reason why the orthonormality assumption is ju- that Cy=YYT is a diagonal matrix.The rows dicious.The true benefit to this assumption is that there exists of P are the principal components of X. an efficient,analytical solution to this problem.We will dis- cuss two solutions in the following sections We begin by rewriting Cy in terms of the unknown variable Notice what we gained with the stipulation of rank-ordered 1 variance.We have a method for judging the importance of Cy =-YYT n the principal direction.Namely,the variances associated with 1 each direction Pi quantify how "principal"each direction is =(PX)(PX)T 1 by rank-ordering each basis vector pi according to the corre- sponding variances.We will now pause to review the implica- =-PXXTPT 1 tions of all the assumptions made to arrive at this mathemati- cal goal. =P(-XXT)PT Cy PCxPT E.Summary of Assumptions Note that we have identified the covariance matrix of X in the last line. This section provides a summary of the assumptions be- Our plan is to recognize that any symmetric matrix A is diag- hind PCA and hint at when these assumptions might perform onalized by an orthogonal matrix of its eigenvectors (by The- poorly. orems 3 and 4 from Appendix A).For a symmetric matrix A Theorem 4 provides A=EDE,where D is a diagonal matrix and E is a matrix of eigenvectors of A arranged as columns.3 I.Linearity Linearity frames the problem as a change of ba- Now comes the trick.We select the matrix P to be a matrix sis.Several areas of research have explored how where each row pi is an eigenvector of IXXT.By this selec- extending these notions to nonlinear regimes (see tion,P=ET.With this relation and Theorem 1 of Appendix Discussion). A (P-1 =PT)we can finish evaluating Cy. II.Large variances have important structure. Cy PCxP7 This assumption also encompasses the belief that =P(ETDE)PT the data has a high SNR.Hence,principal compo- =P(PTDP)PT nents with larger associated variances represent =(PP)D(PP) interesting structure,while those with lower vari- ances represent noise.Note that this is a strong, =(PP-1)D(PP-1) and sometimes,incorrect assumption (see Dis- Cy D cussion). It is evident that the choice of P diagonalizes Cy.This was III.The principal components are orthogonal. the goal for PCA.We can summarize the results of PCA in the This assumption provides an intuitive simplifica- matrices P and Cy. tion that makes PCA soluble with linear algebra decomposition techniques.These techniques are highlighted in the two following sections. 3 The matrix A might have r<m orthonormal eigenvectors where r is the rank of the matrix.When the rank of A is less than m,A is degenerate or all We have discussed all aspects of deriving PCA-what remain data occupy a subspace of dimension r<m.Maintaining the constraint of are the linear algebra solutions.The first solution is some- orthogonality,we can remedy this situation by selecting (m-r)additional orthonormal vectors to "fill up"the matrix E.These additional vectors what straightforward while the second solution involves un- do not effect the final solution because the variances associated with these derstanding an important algebraic decomposition. directions are zero.6 2. Find another direction along which variance is maxi￾mized, however, because of the orthonormality condi￾tion, restrict the search to all directions orthogonal to all previous selected directions. Save this vector as pi 3. Repeat this procedure until m vectors are selected. The resulting ordered set of p’s are the principal components. In principle this simple algorithm works, however that would bely the true reason why the orthonormality assumption is ju￾dicious. The true benefit to this assumption is that there exists an efficient, analytical solution to this problem. We will dis￾cuss two solutions in the following sections. Notice what we gained with the stipulation of rank-ordered variance. We have a method for judging the importance of the principal direction. Namely, the variances associated with each direction pi quantify how “principal” each direction is by rank-ordering each basis vector pi according to the corre￾sponding variances.We will now pause to review the implica￾tions of all the assumptions made to arrive at this mathemati￾cal goal. E. Summary of Assumptions This section provides a summary of the assumptions be￾hind PCA and hint at when these assumptions might perform poorly. I. Linearity Linearity frames the problem as a change of ba￾sis. Several areas of research have explored how extending these notions to nonlinear regimes (see Discussion). II. Large variances have important structure. This assumption also encompasses the belief that the data has a high SNR. Hence, principal compo￾nents with larger associated variances represent interesting structure, while those with lower vari￾ances represent noise. Note that this is a strong, and sometimes, incorrect assumption (see Dis￾cussion). III. The principal components are orthogonal. This assumption provides an intuitive simplifica￾tion that makes PCA soluble with linear algebra decomposition techniques. These techniques are highlighted in the two following sections. We have discussed all aspects of deriving PCA - what remain are the linear algebra solutions. The first solution is some￾what straightforward while the second solution involves un￾derstanding an important algebraic decomposition. V. SOLVING PCA USING EIGENVECTOR DECOMPOSITION We derive our first algebraic solution to PCA based on an im￾portant property of eigenvector decomposition. Once again, the data set is X, an m × n matrix, where m is the number of measurement types and n is the number of samples. The goal is summarized as follows. Find some orthonormal matrix P in Y = PX such that CY ≡ 1 nYYT is a diagonal matrix. The rows of P are the principal components of X. We begin by rewriting CY in terms of the unknown variable. CY = 1 n YYT = 1 n (PX)(PX) T = 1 n PXXTP T = P( 1 n XXT )P T CY = PCXP T Note that we have identified the covariance matrix of X in the last line. Our plan is to recognize that any symmetric matrix A is diag￾onalized by an orthogonal matrix of its eigenvectors (by The￾orems 3 and 4 from Appendix A). For a symmetric matrix A Theorem 4 provides A = EDET , where D is a diagonal matrix and E is a matrix of eigenvectors of A arranged as columns.3 Now comes the trick. We select the matrix P to be a matrix where each row pi is an eigenvector of 1 nXXT . By this selec￾tion, P ≡ E T. With this relation and Theorem 1 of Appendix A (P −1 = P T ) we can finish evaluating CY. CY = PCXP T = P(E TDE)P T = P(P TDP)P T = (PPT )D(PPT ) = (PP−1 )D(PP−1 ) CY = D It is evident that the choice of P diagonalizes CY. This was the goal for PCA. We can summarize the results of PCA in the matrices P and CY. 3 The matrix A might have r ≤ m orthonormal eigenvectors where r is the rank of the matrix. When the rank of A is less than m, A is degenerate or all data occupy a subspace of dimension r ≤ m. Maintaining the constraint of orthogonality, we can remedy this situation by selecting (m−r) additional orthonormal vectors to “fill up” the matrix E. These additional vectors do not effect the final solution because the variances associated with these directions are zero
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有