3 where each row is an orthornormal b_中国高校课件下载中心

点击下载：《模式识别》课程教学资源（书籍文献）A Tutorial on Principal Component Analysis（Jonathon Shlens）

正在加载图片...

where each row is an orthornormal basis vector bi with m ing out the explicit dot products of PX. components.We can consider our naive basis as the effective starting point.All of our data has been recorded in this basis and thus it can be trivially expressed as a linear combination PX X1 Xn of (bi). Pm p1x1…p1X B.Change of Basis pmx1pm'xn With this rigor we may now state more precisely what PCA We can note the form of each column of Y. asks:Is there another basis,which is a linear combination of the original basis,that best re-expresses our data set? P1·X A close reader might have noticed the conspicuous addition of yi the word linear.Indeed,PCA makes one stringent but power- Pm'Xi ful assumption:linearity.Linearity vastly simplifies the prob- lem by restricting the set of potential bases.With this assump- We recognize that each coefficient of yi is a dot-product of tion PCA is now limited to re-expressing the data as a linear xi with the corresponding row in P.In other words,theh combination of its basis vectors. coefficient of yi is a projection on to the ith row of P.This is in fact the very form of an equation where yi is a projection Let X be the original data set,where each column is a single on to the basis of (p1,....Pm.Therefore,the rows of P are a sample (or moment in time)of our data set (i.e.X).In the toy new set of basis vectors for representing of columns of X. example X is an m x n matrix where m =6 and n=72000. Let Y be another m x n matrix related by a linear transfor- mation P.X is the original recorded data set and Y is a new representation of that data set. C.Questions Remaining PX=Y (1) By assuming linearity the problem reduces to finding the ap- propriate change of basis.The row vectors [p1,...,Pm}in Also let us define the following quantities. this transformation will become the principal components of X.Several questions now arise. 。pi are the rows of P What is the best way to re-express X? .xi are the columns of X (or individual X). What is a good choice of basis P? yi are the columns of Y. These questions must be answered by next asking ourselves what features we would like Y to exhibit.Evidently,addi- Equation 1 represents a change of basis and thus can have tional assumptions beyond linearity are required to arrive at a reasonable result.The selection of these assumptions is the many interpretations. subject of the next section 1.P is a matrix that transforms X into Y. IV.VARIANCE AND THE GOAL 2.Geometrically,P is a rotation and a stretch which again transforms X into Y. Now comes the most important question:what does best ex- press the data mean?This section will build up an intuitive 3.The rows of P,{p1,...,Pm},are a set of new basis vec- tors for expressing the columns of X. answer to this question and along the way tack on additional assumptions. The latter interpretation is not obvious but can be seen by writ- A.Noise and Rotation IIn this sectionand y arec vectors,but be forewamed.In all other Measurement noise in any data set must be low or else,no sections xi and yi are row vectors. matter the analysis technique,no information about a signal3 where each row is an orthornormal basis vector bi with m components. We can consider our naive basis as the effective starting point. All of our data has been recorded in this basis and thus it can be trivially expressed as a linear combination of {bi}. B. Change of Basis With this rigor we may now state more precisely what PCA asks: Is there another basis, which is a linear combination of the original basis, that best re-expresses our data set? A close reader might have noticed the conspicuous addition of the word linear. Indeed, PCA makes one stringent but powerful assumption: linearity. Linearity vastly simplifies the problem by restricting the set of potential bases. With this assumption PCA is now limited to re-expressing the data as a linear combination of its basis vectors. Let X be the original data set, where each column is a single sample (or moment in time) of our data set (i.e. ~X). In the toy example X is an m × n matrix where m = 6 and n = 72000. Let Y be another m × n matrix related by a linear transformation P. X is the original recorded data set and Y is a new representation of that data set. PX = Y (1) Also let us define the following quantities.1 • pi are the rows of P • xi are the columns of X (or individual ~X). • yi are the columns of Y. Equation 1 represents a change of basis and thus can have many interpretations. 1. P is a matrix that transforms X into Y. 2. Geometrically, P is a rotation and a stretch which again transforms X into Y. 3. The rows of P, {p1,...,pm}, are a set of new basis vectors for expressing the columns of X. The latter interpretation is not obvious but can be seen by writ- 1 In this section xi and yi are column vectors, but be forewarned. In all other sections xi and yi are row vectors. ing out the explicit dot products of PX. PX =    p1 . . . pm    x1 ··· xn Y =    p1 · x1 ··· p1 · xn . . . . . . . . . pm · x1 ··· pm · xn    We can note the form of each column of Y. yi =    p1 · xi . . . pm · xi    We recognize that each coefficient of yi is a dot-product of xi with the corresponding row in P. In other words, the j th coefficient of yi is a projection on to the j th row of P. This is in fact the very form of an equation where yi is a projection on to the basis of {p1,...,pm}. Therefore, the rows of P are a new set of basis vectors for representing of columns of X. C. Questions Remaining By assuming linearity the problem reduces to finding the appropriate change of basis. The row vectors {p1,...,pm} in this transformation will become the principal components of X. Several questions now arise. • What is the best way to re-express X? • What is a good choice of basis P? These questions must be answered by next asking ourselves what features we would like Y to exhibit. Evidently, additional assumptions beyond linearity are required to arrive at a reasonable result. The selection of these assumptions is the subject of the next section. IV. VARIANCE AND THE GOAL Now comes the most important question: what does best express the data mean? This section will build up an intuitive answer to this question and along the way tack on additional assumptions. A. Noise and Rotation Measurement noise in any data set must be low or else, no matter the analysis technique, no information about a signal

<<向上翻页向下翻页>>

点击下载：《模式识别》课程教学资源（书籍文献）A Tutorial on Principal Component Analysis（Jonathon Shlens）