正在加载图片...
630 Chapter 14.Statistical Description of Data first variable z taking on its ith value,and the second variable y taking on its jth value.Let N denote the total number of events,the sum of all the Nii's.Let Ni. denote the number of events for which the first variable x takes on its ith value regardless of the value of y;N.j is the number of events with the jth value of y regardless of x.So we have N.=∑N N=∑N (14.4.1) N=∑N=∑N N.j and Ni.are sometimes called the row and column totals or marginals,but we will use these terms cautiously since we can never keep straight which are the rows and which are the columns! ICAL The null hypothesis is that the two variables x and y have no association.In this case,the probability of a particular value of x given a particular value of y should RECIPES be the same as the probability of that value of z regardless of y.Therefore,in the null hypothesis,the expected number for any Nij,which we will denote n,can be 9 calculated from only the row and column totals, which implies Ni.N.i N.iN nij= (14.4.2) t9983 9 Notice that if a column or row total is zero,then the expected number for all the entries in that column or row is also zero;in that case,the never-occurring bin of z or y should simply be removed from the analysis. The chi-square statistic is now given by equation(14.3.1),which,in the present case,is summed over all entries in the table, X2-∑-n)2 (14.4.3) i,j Numerica 10.621 The number of degrees of freedom is equal to the number of entries in the table 431 (product of its row size and column size)minus the number of constraints that have Recipes arisen from our use of the data themselves to determine the nij.Each row total and column total is a constraint,except that this overcounts by one,since the total of the column totals and the total of the row totals both equal N,the total number of data North points.Therefore,if the table is of size I by 7,the number of degrees of freedom is IJ-I-J+1.Equation (14.4.3),along with the chi-square probability function (86.2),now give the significance of an association between the variables z and y. Suppose there is a significant association.How do we quantify its strength,so that(e.g.)we can compare the strength of one association with another?The idea here is to find some reparametrization of x2 which maps it into some convenient interval,like 0 to 1,where the result is not dependent on the quantity of data that we happen to sample,but rather depends only on the underlying population from which the data were drawn.There are several different ways of doing this.Two of the more common are called Cramer's V and the contingency coefficient C.630 Chapter 14. Statistical Description of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). first variable x taking on its ith value, and the second variable y taking on its jth value. Let N denote the total number of events, the sum of all the N ij ’s. Let Ni· denote the number of events for which the first variable x takes on its ith value regardless of the value of y; N·j is the number of events with the jth value of y regardless of x. So we have Ni· =  j Nij N·j =  i Nij N =  i Ni· =  j N·j (14.4.1) N·j and Ni· are sometimes called the row and column totals or marginals, but we will use these terms cautiously since we can never keep straight which are the rows and which are the columns! The null hypothesis is that the two variables x and y have no association. In this case, the probability of a particular value of x given a particular value of y should be the same as the probability of that value of x regardless of y. Therefore, in the null hypothesis, the expected number for any Nij , which we will denote nij , can be calculated from only the row and column totals, nij N·j = Ni· N which implies nij = Ni·N·j N (14.4.2) Notice that if a column or row total is zero, then the expected number for all the entries in that column or row is also zero; in that case, the never-occurring bin of x or y should simply be removed from the analysis. The chi-square statistic is now given by equation (14.3.1), which, in the present case, is summed over all entries in the table, χ2 =  i,j (Nij − nij )2 nij (14.4.3) The number of degrees of freedom is equal to the number of entries in the table (product of its row size and column size) minus the number of constraints that have arisen from our use of the data themselves to determine the nij . Each row total and column total is a constraint, except that this overcounts by one, since the total of the column totals and the total of the row totals both equal N, the total number of data points. Therefore, if the table is of size I by J, the number of degrees of freedom is IJ − I − J + 1. Equation (14.4.3), along with the chi-square probability function (§6.2), now give the significance of an association between the variables x and y. Suppose there is a significant association. How do we quantify its strength, so that (e.g.) we can compare the strength of one association with another? The idea here is to find some reparametrization of χ2 which maps it into some convenient interval, like 0 to 1, where the result is not dependent on the quantity of data that we happen to sample, but rather depends only on the underlying population from which the data were drawn. There are several different ways of doing this. Two of the more common are called Cramer’s V and the contingency coefficient C
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有