正在加载图片...
Overview Principal component analysis Herve Abdi1*and Lynne J.Williams2 Principal component analysis(PCA)is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables.Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components,and to display the pattern of similarity of the observations and of the variables as points in maps.The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife.PCA can be generalized as correspondence analysis (CA)in order to handle qualitative variables and as multiple factor analysis(MFA)in order to handle heterogeneous sets of variables. Mathematically,PCA depends upon the eigen-decomposition of positive semi- definite matrices and upon the singular value decomposition(SVD)of rectangular matrices.2010 John Wiley Sons,Inc.WIREs Comp Stat 2010 2 433-459 o mlee from the same matrix all use the same letter (e.g., A,a,a).The transpose operation is denoted by the and it is used by almost all scientific disciplines.It superscript'.The identity matrix is denoted I. is also likely to be the oldest multivariate technique. The data table to be analyzed by PCA comprises In fact,its origin can be traced back to Pearson!or observations described by I variables and it is even Cauchy2 [see Ref 3,p.416],or Jordan4 and also represented by the I x I matrix X,whose generic Cayley,Silverster,and Hamilton,[see Refs 5,6,for element is xij.The matrix X has rank L where more details]but its modern instantiation was formal- L≤min{l,J ized by Hotelling'who also coined the term principal In general,the data table will be preprocessed component.PCA analyzes a data table representing before the analysis.Almost always,the columns of X observations described by several dependent vari- will be centered so that the mean of each column ables,which are,in general,inter-correlated.Its goal is equal to 0 (i.e.,X1=0,where 0 is a I by is to extract the important information from the data 1 vector of zeros and 1 is an I by 1 vector of table and to express this information as a set of new ones).If in addition,each element of X is divided orthogonal variables called principal components. by√I(or√T-i),the analysis is referred to as PCA also represents the pattern of similarity of the a covariance PCA because,in this case,the matrix observations and the variables by displaying them as XTX is a covariance matrix.In addition to centering, points in maps [see Refs 8-10 for more details]. when the variables are measured with different units, it is customary to standardize each variable to unit PREREQUISITE NOTIONS AND norm.This is obtained by dividing each variable by NOTATIONS its norm (i.e.,the square root of the sum of all the squared elements of this variable).In this case,the Matrices are denoted in upper case bold,vectors are analysis is referred to as a correlation PCA because, denoted in lower case bold,and elements are denoted then,the matrix X'X is a correlation matrix(most in lower case italic.Matrices,vectors,and elements statistical packages use correlation preprocessing as a default). The matrix X has the following singular value decomposition [SVD,see Refs 11-13 and Appendix B +Correspondence to:herve@utdallas.edu for an introduction to the SVD]: 1School of Behavioral and Brain Sciences,The University of Texas at Dallas,MS:GR4.1,Richardson,TX 75080-3021,USA X=PAQT (1) 2Department of Psychology,University of Toronto Scarborough, Ontario,Canada where P is the Ix L matrix of left singular vectors, DOL:10.1002wics.101 Q is the Ix L matrix of right singular vectors,and A Volume 2,July/August 2010 2010 John Wiley Sons,Inc. 433Overview Principal component analysis Herve Abdi ´ 1∗ and Lynne J. Williams2 Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen-decomposition of positive semi￾definite matrices and upon the singular value decomposition (SVD) of rectangular matrices.  2010 John Wiley & Sons, Inc. WIREs Comp Stat 2010 2 433–459 Principal component analysis (PCA) is probably the most popular multivariate statistical technique and it is used by almost all scientific disciplines. It is also likely to be the oldest multivariate technique. In fact, its origin can be traced back to Pearson1 or even Cauchy2 [see Ref 3, p. 416], or Jordan4 and also Cayley, Silverster, and Hamilton, [see Refs 5,6, for more details] but its modern instantiation was formal￾ized by Hotelling7 who also coined the term principal component. PCA analyzes a data table representing observations described by several dependent vari￾ables, which are, in general, inter-correlated. Its goal is to extract the important information from the data table and to express this information as a set of new orthogonal variables called principal components. PCA also represents the pattern of similarity of the observations and the variables by displaying them as points in maps [see Refs 8–10 for more details]. PREREQUISITE NOTIONS AND NOTATIONS Matrices are denoted in upper case bold, vectors are denoted in lower case bold, and elements are denoted in lower case italic. Matrices, vectors, and elements ∗Correspondence to: herve@utdallas.edu 1School of Behavioral and Brain Sciences, The University of Texas at Dallas, MS: GR4.1, Richardson, TX 75080-3021, USA 2Department of Psychology, University of Toronto Scarborough, Ontario, Canada DOI: 10.1002/wics.101 from the same matrix all use the same letter (e.g., A, a, a). The transpose operation is denoted by the superscriptT. The identity matrix is denoted I. The data table to be analyzed by PCA comprises I observations described by J variables and it is represented by the I × J matrix X, whose generic element is xi,j. The matrix X has rank L where L ≤ min ! I,J " . In general, the data table will be preprocessed before the analysis. Almost always, the columns of X will be centered so that the mean of each column is equal to 0 (i.e., XT1 = 0, where 0 is a J by 1 vector of zeros and 1 is an I by 1 vector of ones). If in addition, each element of X is divided by √ I (or √I − 1), the analysis is referred to as a covariance PCA because, in this case, the matrix XTX is a covariance matrix. In addition to centering, when the variables are measured with different units, it is customary to standardize each variable to unit norm. This is obtained by dividing each variable by its norm (i.e., the square root of the sum of all the squared elements of this variable). In this case, the analysis is referred to as a correlation PCA because, then, the matrix XTX is a correlation matrix (most statistical packages use correlation preprocessing as a default). The matrix X has the following singular value decomposition [SVD, see Refs 11–13 and Appendix B for an introduction to the SVD]: X = P!QT (1) where P is the I × L matrix of left singular vectors, Q is the J × L matrix of right singular vectors, and ! Volume 2, July/August 2010  2010 John Wiley & Son s, In c. 433
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有