正在加载图片...
622 Chapter 14.Statistical Description of Data Next we consider the case of comparing two binned data sets.Let Ri be the number of events in bin i for the first data set,S;the number of events in the same bin i for the second data set.Then the chi-square statistic is x2=∑ (R-S)2 (14.3.2) Ri+Si Comparing (14.3.2)to (14.3.1),you should note that the denominator of (14.3.2)is not just the average of Ri and Si(which would be an estimator of n;in 14.3.1) Rather,it is twice the average,the sum.The reason is that each term in a chi-square 81 sum is supposed to approximate the square of a normally distributed quantity with unit variance.The variance of the difference of two normal quantities is the sum of their individual variances,not the average. 茶 If the data were collected in such a way that the sum of the Ri's is necessarily equal to the sum of Si's,then the number of degrees of freedom is equal to one less than the number of bins,NB-1 (that is,knstrn =1),the usual case.If this requirement were absent,then the number of degrees of freedom would be N B. Example:A birdwatcher wants to know whether the distribution of sighted birds as a function of species is the same this year as last.Each bin corresponds to one species.If the birdwatcher takes his data to be the first 1000 birds that he saw in 33 Press. each year,then the number of degrees of freedom is NB-1.If he takes his data to be all the birds he saw on a random sample of days,the same days in each year,then the number of degrees of freedom is NB(knstrn =0).In this latter case,note that he is also testing whether the birds were more numerous overall in one year or the other:That is the extra degree of freedom.Of course,any additional constraints on the data set lower the number of degrees of freedom (i.e.,increase knstrn to more 61 positive values)in accordance with their number. The program is void chstwo(float binsi[],float bins2[],int nbins,int knstrn,float *df, float *chsq,float *prob) Given the arrays bins1[1..nbins]and bins2[1..nbins],containing two sets of binned data,and given the number of constraints knstrn (normally 1 or 0),this routine returns the Fuunrggoirioh Numerical Recipes 10621 number of degrees of freedom df,the chi-square chsq,and the significance prob.A small value of prob indicates a significant difference between the distributions bins1 and bins2.Note that 43106 bins1 and bins2 are both float arrays,although they will normally contain integer values. float gammq(float a,float x); int j; (outside float temp; Software. *df=nbins-knstrn; *chsq=0.0; for (j=1;j<=nbins;j++) if(b1ns1[j]=0.0&&bins2[j]==0.0) --(*df); No data means one less degree of free- else dom. temp=bins1[j]-bins2[j]; *chsq +temp*temp/(bins1[j]+bins2[j]); *prob=gammq(0.5*(*df),0.5*(*chsq)); Chi-square probability function.See 86.2.622 Chapter 14. Statistical Description of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Next we consider the case of comparing two binned data sets. Let Ri be the number of events in bin i for the first data set, Si the number of events in the same bin i for the second data set. Then the chi-square statistic is χ2 =  i (Ri − Si)2 Ri + Si (14.3.2) Comparing (14.3.2) to (14.3.1), you should note that the denominator of (14.3.2) is not just the average of Ri and Si (which would be an estimator of ni in 14.3.1). Rather, it is twice the average, the sum. The reason is that each term in a chi-square sum is supposed to approximate the square of a normally distributed quantity with unit variance. The variance of the difference of two normal quantities is the sum of their individual variances, not the average. If the data were collected in such a way that the sum of the Ri’s is necessarily equal to the sum of Si’s, then the number of degrees of freedom is equal to one less than the number of bins, NB − 1 (that is, knstrn = 1), the usual case. If this requirement were absent, then the number of degrees of freedom would be N B. Example: A birdwatcher wants to know whether the distribution of sighted birds as a function of species is the same this year as last. Each bin corresponds to one species. If the birdwatcher takes his data to be the first 1000 birds that he saw in each year, then the number of degrees of freedom is N B − 1. If he takes his data to be all the birds he saw on a random sample of days, the same days in each year, then the number of degrees of freedom is N B (knstrn = 0). In this latter case, note that he is also testing whether the birds were more numerous overall in one year or the other: That is the extra degree of freedom. Of course, any additional constraints on the data set lower the number of degrees of freedom (i.e., increase knstrn to more positive values) in accordance with their number. The program is void chstwo(float bins1[], float bins2[], int nbins, int knstrn, float *df, float *chsq, float *prob) Given the arrays bins1[1..nbins] and bins2[1..nbins], containing two sets of binned data, and given the number of constraints knstrn (normally 1 or 0), this routine returns the number of degrees of freedom df, the chi-square chsq, and the significance prob. A small value of prob indicates a significant difference between the distributions bins1 and bins2. Note that bins1 and bins2 are both float arrays, although they will normally contain integer values. { float gammq(float a, float x); int j; float temp; *df=nbins-knstrn; *chsq=0.0; for (j=1;j<=nbins;j++) if (bins1[j] == 0.0 && bins2[j] == 0.0) --(*df); No data means one less degree of free￾else { dom. temp=bins1[j]-bins2[j]; *chsq += temp*temp/(bins1[j]+bins2[j]); } *prob=gammq(0.5*(*df),0.5*(*chsq)); Chi-square probability function. See §6.2. }
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有