includegraphicscovPlot.ps Figure 2.1:_中国高校课件下载中心

点击下载：北京大学：《模式识别》课程教学资源（参考资料）A tutorial on Principal Components Analysis

正在加载图片...

includegraphicscovPlot.ps Figure 2.1:A plot of the covariance data showing positive relationship between the number of hours studied against the mark received It is exactly the same except that in the second set of brackets,the X's are replaced by Y's.This says,in English,"For each data item,multiply the difference between the z value and the mean of by the the difference between the y value and the mean of y. Add all these up,and divide by (n-1)". How does this work?Lets use some example data.Imagine we have gone into the world and collected some 2-dimensional data,say,we have asked a bunch of students how many hours in total that they spent studying COSC241,and the mark that they received.So we have two dimensions,the first is the H dimension,the hours studied, and the second is the M dimension,the mark received.Figure 2.2 holds my imaginary data,and the calculation of cov(H,M),the covariance between the Hours of study done and the Mark received. So what does it tell us?The exact value is not as important as it's sign(ie.positive or negative).If the value is positive,as it is here,then that indicates that both di- mensions increase together,meaning that,in general,as the number of hours of study increased,so did the final mark. If the value is negative,then as one dimension increases,the other decreases.If we had ended up with a negative covariance here,then that would have said the opposite, that as the number of hours of study increased the the final mark decreased. In the last case.if the covariance is zero.it indicates that the two dimensions are independent of each other. The result that mark given increases as the number of hours studied increases can be easily seen by drawing a graph of the data,as in Figure 2.1.3.However,the luxury of being able to visualize data is only available at 2 and 3 dimensions.Since the co- variance value can be calculated between any 2 dimensions in a data set,this technique is often used to find relationships between dimensions in high-dimensional data sets where visualisation is difficult. You might ask "is cov(X,Y)equal to cov(Y,X)"?Well,a quick look at the for- mula for covariance tells us that yes,they are exactly the same since the only dif- ference between cov(X,Y)and cov(Y,X)is that (XiX)(Yi-Y)is replaced by (Y-(X;X).And since multiplication is commutative,which means that it doesn't matter which way around I multiply two numbers,I always get the same num- ber,these two equations give the same answer. 2.1.4 The covariance Matrix Recall that covariance is always measured between 2 dimensions.If we have a data set with more than 2 dimensions,there is more than one covariance measurement that can be calculated.For example,from a 3 dimensional data set(dimensions y,z)you could calculate cov(,y),(cov(,z),and cov(y,2).In fact,for an n-dimensional data set,you can calculate different covariance values. 刀 6includegraphicscovPlot.ps Figure 2.1: A plot of the covariance data showing positive relationship between the number of hours studied against the mark received It is exactly the same except that in the second set of brackets, the ’s are replaced by ❋ ’s. This says, in English, “For each data item, multiply the difference between the ❁ value and the mean of ❁, by the the difference between the ❂ value and the mean of ❂. Add all these up, and divide by ✵✣✳✲ ✆✏✷ ”. How does this work? Lets use some example data. Imagine we have gone into the world and collected some 2-dimensional data, say, we have asked a bunch of students how many hours in total that they spent studying COSC241, and the mark that they received. So we have two dimensions, the first is the ❍ dimension, the hours studied, and the second is the ■ dimension, the mark received. Figure 2.2 holds my imaginary data, and the calculation of ❇✫❈❉❄ ✵❍ ❊ ■✷ , the covariance between the Hours of study done and the Mark received. So what does it tell us? The exact value is not as important as it’s sign (ie. positive or negative). If the value is positive, as it is here, then that indicates that both dimensions increase together, meaning that, in general, as the number of hours of study increased, so did the final mark. If the value is negative, then as one dimension increases, the other decreases. If we had ended up with a negative covariance here, then that would have said the opposite, that as the number of hours of study increased the the final mark decreased. In the last case, if the covariance is zero, it indicates that the two dimensions are independent of each other. The result that mark given increases as the number of hours studied increases can be easily seen by drawing a graph of the data, as in Figure 2.1.3. However, the luxury of being able to visualize data is only available at 2 and 3 dimensions. Since the covariance value can be calculated between any 2 dimensions in a data set, this technique is often used to find relationships between dimensions in high-dimensional data sets where visualisation is difficult. You might ask “is ❇✶❈❉❄ ✵✽❊✶❋✱✷ equal to ❇✫❈❉❄ ✵❋❏❊❑❆✷ ”? Well, a quick look at the formula for covariance tells us that yes, they are exactly the same since the only difference between ❇✶❈❉❄ ✵✽❊✶❋✱✷ and ❇✫❈❉❄ ✵❋▲❊▼◆✷ is that ✵ ✧ ✲ ❆✷ ✤ ✵❋ ✧ ✲ ❋✼✷ ✤ is replaced by ✵❋ ✧ ✲ ❋✍✷ ✤ ✵ ✧ ✲ ❆✷ ✤ . And since multiplication is commutative, which means that it doesn’t matter which way around I multiply two numbers, I always get the same number, these two equations give the same answer. 2.1.4 The covariance Matrix Recall that covariance is always measured between 2 dimensions. If we have a data set with more than 2 dimensions, there is more than one covariance measurement that can be calculated. For example, from a 3 dimensional data set (dimensions ❁, ❂, ❃) you could calculate ❇✫❈❉❄ ✵❁ ❊ ❂ ✷ , ✵ ❇✫❈❉❄ ✵❁ ❊ ❃ ✷ , and ❇✫❈❉❄ ✵❂ ❊ ❃ ✷ . In fact, for an ✣-dimensional data set, you can calculate ✦P❖ ◗✦❙❘ ✹❉❚ ❖ ❯✹ different covariance values. 6

<<向上翻页向下翻页>>

点击下载：北京大学：《模式识别》课程教学资源（参考资料）A tutorial on Principal Components Analysis