14.3 Are Two Distributions Different?_中国高校课件下载中心

点击下载：《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 14.3

正在加载图片...

14.3 Are Two Distributions Different? 621 integers,while the ni's may not be.Then the chi-square statistic is x2=∑-n2 (14.3.1) ni 2 where the sum is over all bins.A large value of x2 indicates that the null hypothesis (that the Ni's are drawn from the population represented by the ni's)is rather unlikely. Any term j in (14.3.1)with 0=nj =Ni should be omitted from the sum.A term with n;=0,N;0 gives an infinite x2,as it should,since in this case the Ni's cannot possibly be drawn from the ni's! 8 The chi-square probability function Q(xv)is an incomplete gamma function, and was already discussed in $6.2 (see equation 6.2.18).Strictly speaking (x2) 18881892 nted for is the probability that the sum of the squares of v random normal variables of unit variance (and zero mean)will be greater than x2.The terms in the sum (14.3.1) are not individually normal.However,if either the number of bins is large (1), or the number of events in each bin is large (1),then the chi-square probability from NUMERICAL RECIPES I function is a good approximation to the distribution of(14.3.1)in the case of the null hypothesis.Its use to estimate the significance of the chi-square test is standard. The appropriate value of v,the number of degrees of freedom,bears some additional discussion.If the data are collected with the model ni's fixed-that is,not later renormalized to fit the total observed number of events XN;-then v 邑免乡 equals the number of bins NB.(Note that this is not the total number of events! Much more commonly.the ni's are normalized after the fact so that their sum equals the sum of the Ni's.In this case the correct value for v is NB-1,and the model is said to have one constraint(knstrn=1 in the program below).If the model that OF SCIENTIFIC gives the ni's has additional free parameters that were adjusted after the fact to agree with the data,then each of these additional "fitted"parameters decreases v(and increases knstrn)by one additional unit. We have,then,the following program: void chsone(float bins[],float ebins,int nbins,int knstrn,float *df, COMPUTING (ISBN 18881292 float *chsq,float *prob) Given the array bins[1..nbins]containing the observed numbers of events,and an array ebins[1..nbins]containing the expected numbers of events,and given the number of con- 10621 straints knstrn (normally one),this routine returns (trivially)the number of degrees of freedom df,and (nontrivially)the chi-square chsq and the significance prob.A small value of prob Fuunrgroirioh Numerical Recipes 43106 indicates a significant difference between the distributions bins and ebins.Note that bins and ebins are both float arrays,although bins will normally contain integer values. (outside float gammq(float a,float x); void nrerror(char error_text[]); Software. int j; float temp; ying of *df=nbins-knstrn; *chsq=0.0; for (j=1;j<=nbins;j++) if (ebins[i]<0.0)nrerror("Bad expected number in chsone"); temp=bins[j]-ebins[j]; *chsq +temp*temp/ebins[j]; *prob=gammq(0.5*(*df),0.5*(*chsq)); Chi-square probability function.See $6.2.14.3 Are Two Distributions Different? 621 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). integers, while the ni’s may not be. Then the chi-square statistic is χ2 = i (Ni − ni)2 ni (14.3.1) where the sum is over all bins. A large value of χ2 indicates that the null hypothesis (that the Ni’s are drawn from the population represented by the ni’s) is rather unlikely. Any term j in (14.3.1) with 0 = nj = Nj should be omitted from the sum. A term with nj = 0, Nj = 0 gives an infinite χ2, as it should, since in this case the Ni’s cannot possibly be drawn from the ni’s! The chi-square probability function Q(χ2|ν) is an incomplete gamma function, and was already discussed in §6.2 (see equation 6.2.18). Strictly speaking Q(χ2|ν) is the probability that the sum of the squares of ν random normal variables of unit variance (and zero mean) will be greater than χ2. The terms in the sum (14.3.1) are not individually normal. However, if either the number of bins is large ( 1), or the number of events in each bin is large ( 1), then the chi-square probability function is a good approximation to the distribution of (14.3.1) in the case of the null hypothesis. Its use to estimate the significance of the chi-square test is standard. The appropriate value of ν, the number of degrees of freedom, bears some additional discussion. If the data are collected with the model ni’s fixed — that is, not later renormalized to fit the total observed number of events ΣN i — then ν equals the number of bins NB. (Note that this is not the total number of events!) Much more commonly, the ni’s are normalized after the fact so that their sum equals the sum of the Ni’s. In this case the correct value for ν is NB − 1, and the model is said to have one constraint (knstrn=1 in the program below). If the model that gives the ni’s has additional free parameters that were adjusted after the fact to agree with the data, then each of these additional “fitted” parameters decreases ν (and increases knstrn) by one additional unit. We have, then, the following program: void chsone(float bins[], float ebins[], int nbins, int knstrn, float *df, float *chsq, float *prob) Given the array bins[1..nbins] containing the observed numbers of events, and an array ebins[1..nbins] containing the expected numbers of events, and given the number of constraints knstrn (normally one), this routine returns (trivially) the number of degrees of freedom df, and (nontrivially) the chi-square chsq and the significance prob. A small value of prob indicates a significant difference between the distributions bins and ebins. Note that bins and ebins are both float arrays, although bins will normally contain integer values. { float gammq(float a, float x); void nrerror(char error_text[]); int j; float temp; *df=nbins-knstrn; *chsq=0.0; for (j=1;j<=nbins;j++) { if (ebins[j] <= 0.0) nrerror("Bad expected number in chsone"); temp=bins[j]-ebins[j]; *chsq += temp*temp/ebins[j]; } *prob=gammq(0.5*(*df),0.5*(*chsq)); Chi-square probability function. See §6.2. }

<<向上翻页向下翻页>>

点击下载：《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 14.3