正在加载图片...
636 Chapter 14.Statistical Description of Data Norusis,M.J.1982.SPSS Introductory Guide:Basic Statistics and Operations:and 1985,SPSS- X Advanced Statistics Guide (New York:McGraw-Hill). Fano,R.M.1961,Transmission of Information (New York:Wiley and MIT Press),Chapter 2. 14.5 Linear Correlation We next turn to measures of association between variables that are ordinal 三 or continuous,rather than nominal.Most widely used is the linear correlation coefficient.For pairs of quantities (i,y),i=1,...,N,the linear correlation coefficient r (also called the product-moment correlation coefficient,or Pearson's 鱼君 r)is given by the formula ICAL (x-)(班-) r= (14.5.1) ∑(x-)2∑(-列2 9 where,as usual,is the mean of the i's,7 is the mean of the yi's. The value ofr lies between-1 and 1.inclusive.It takes on a value of 1,termed 王。分 "complete positive correlation,"when the data points lie on a perfect straight line with positive slope,with z and y increasing together.The value 1 holds independent of the magnitude of the slope.If the data points lie on a perfect straight line with aRS兰g%0 9 negative slope,y decreasing as increases,then r has the value-1;this is called "complete negative correlation."A value ofr near zero indicates that the variables x and y are uncorrelated. When a correlation is known to be significant,r is one conventional way of summarizing its strength.In fact,the value of r can be translated into a statement about what residuals(root mean square deviations)are to be expected if the data are fitted to a straight line by the least-squares method(see $15.2,especially equations 15.2.13-15.2.14).Unfortunately,r is a rather poor statistic for deciding whether an observed correlation is statistically significant,and/or whether one observed sfgG分N 10-521 correlation is significantly stronger than another.The reason is that r is ignorant of Numerica the individual distributions of r and y,so there is no universal way to compute its 43106 distribution in the case of the null hypothesis. About the only general statement that can be made is this:If the null hypothesis is that x and y are uncorrelated,and if the distributions for x and y each have enough convergent moments ("tails"die off sufficiently rapidly),and if N is large (typically >500),then r is distributed approximately normally,with a mean of zero and a standard deviation of 1/vN.In that case,the (double-sided)significance of the correlation,that is,the probability thatr should be larger than its observed value in the null hypothesis,is (14.5.2) where erfc(x)is the complementary error function,equation (6.2.8),computed by the routines erffc or erfcc of $6.2.A small value of(14.5.2)indicates that the636 Chapter 14. Statistical Description of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Norusis, M.J. 1982, SPSS Introductory Guide: Basic Statistics and Operations; and 1985, SPSS￾X Advanced Statistics Guide (New York: McGraw-Hill). Fano, R.M. 1961, Transmission of Information (New York: Wiley and MIT Press), Chapter 2. 14.5 Linear Correlation We next turn to measures of association between variables that are ordinal or continuous, rather than nominal. Most widely used is the linear correlation coefficient. For pairs of quantities (xi, yi), i = 1,...,N, the linear correlation coefficient r (also called the product-moment correlation coefficient, or Pearson’s r) is given by the formula r =  i (xi − x)(yi − y)  i (xi − x)2  i (yi − y)2 (14.5.1) where, as usual, x is the mean of the xi’s, y is the mean of the yi’s. The value of r lies between −1 and 1, inclusive. It takes on a value of 1, termed “complete positive correlation,” when the data points lie on a perfect straight line with positive slope, with x and y increasing together. The value 1 holds independent of the magnitude of the slope. If the data points lie on a perfect straight line with negative slope, y decreasing as x increases, then r has the value −1; this is called “complete negative correlation.” A value of r near zero indicates that the variables x and y are uncorrelated. When a correlation is known to be significant, r is one conventional way of summarizing its strength. In fact, the value of r can be translated into a statement about what residuals (root mean square deviations) are to be expected if the data are fitted to a straight line by the least-squares method (see §15.2, especially equations 15.2.13 – 15.2.14). Unfortunately, r is a rather poor statistic for deciding whether an observed correlation is statistically significant, and/or whether one observed correlation is significantly stronger than another. The reason is that r is ignorant of the individual distributions of x and y, so there is no universal way to compute its distribution in the case of the null hypothesis. About the only general statement that can be made is this: If the null hypothesis is that x and y are uncorrelated, and if the distributions for x and y each have enough convergent moments (“tails” die off sufficiently rapidly), and if N is large (typically > 500), then r is distributed approximately normally, with a mean of zero and a standard deviation of 1/ √ N. In that case, the (double-sided) significance of the correlation, that is, the probability that |r| should be larger than its observed value in the null hypothesis, is erfc |r| √ N √2  (14.5.2) where erfc(x) is the complementary error function, equation (6.2.8), computed by the routines erffc or erfcc of §6.2. A small value of (14.5.2) indicates that the
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有