正在加载图片...
632 Chapter 14.Statistical Description of Data if (sumi[i]==0.0)--nni; Eliminate any zero rows by reducing the num- ber. for (j=1;j<=nj;j++) Get the column totals sumj[j]=0.0; for (i=1;i<=ni;i++)sumj[j]+nn[i][j]; if (sumj[j]==0.0)--nnj; Eliminate any zero columns. *df=nni*nnj-nni-nnj+1; Corrected number of degrees of freedom. *ch1sg=0.0; for(1=1;1<=n1;1+){ Do the chi-square sum. for (j=1ii<=nj;j++) expctd=sumj[j]*sumi[i]/sum; 三 temp=nn[i][j]-expctd; *chisq +temp*temp/(expctd+TINY); Here TINY guarantees that any eliminated row or column will not contribute to the sum. *prob=gammq (0.5*(*df),0.5*(*chisq)) Chi-square probability function minij nni nnj nni-1 nnj-1; *cramrv=sgrt(*chisq/(sum*minij)); 3 *ccc=sqrt (*chisq/(*chisq+sum)); free_vector(sumj,1,nj); free_vector(sumi,1,ni); RECIPES I Press. Measures of Association Based on Entropy Consider the game of"twenty questions,"where by repeated yes/no questions 9 you try to eliminate all except one correct possibility for an unknown object.Better yet,consider a generalization of the game,where you are allowed to ask multiple IENTIFIC choice questions as well as binary (yes/no)ones.The categories in your multiple choice questions are supposed to be mutually exclusive and exhaustive(as are "yes" and“no"). The value to you of an answer increases with the number of possibilities that it eliminates.More specifically,an answer that eliminates all except a fraction p of the remaining possibilities can be assigned a value-Inp(a positive number,since p<1).The purpose of the logarithm is to make the value additive,since (e.g.)one Recipes Numerica 10621 question that eliminates all but 1/6 of the possibilities is considered as good as two questions that,in sequence,reduce the number by factors 1/2 and 1/3. 43106 So that is the value of an answer:but what is the value of a question?If there Recipes are possible answers to the question (i=1,...,1)and the fraction of possibilities consistent with the ith answer is pi(with the sum of the pi's equal to one),then the value of the question is the expectation value of the value of the answer,denoted H. H (14.4.6) In evaluating (14.4.6),note that lim plnp=0 (14.4.7) D- The value H lies between 0 and In I.It is zero only when one of the pi's is one,all the others zero:In this case,the question is valueless,since its answer is preordained632 Chapter 14. Statistical Description of Data Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machine￾readable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). } if (sumi[i] == 0.0) --nni; Eliminate any zero rows by reducing the num- } ber. for (j=1;j<=nj;j++) { Get the column totals. sumj[j]=0.0; for (i=1;i<=ni;i++) sumj[j] += nn[i][j]; if (sumj[j] == 0.0) --nnj; Eliminate any zero columns. } *df=nni*nnj-nni-nnj+1; Corrected number of degrees of freedom. *chisq=0.0; for (i=1;i<=ni;i++) { Do the chi-square sum. for (j=1;j<=nj;j++) { expctd=sumj[j]*sumi[i]/sum; temp=nn[i][j]-expctd; *chisq += temp*temp/(expctd+TINY); Here TINY guarantees that any eliminated row or column will not contribute to the sum. } } *prob=gammq(0.5*(*df),0.5*(*chisq)); Chi-square probability function. minij = nni < nnj ? nni-1 : nnj-1; *cramrv=sqrt(*chisq/(sum*minij)); *ccc=sqrt(*chisq/(*chisq+sum)); free_vector(sumj,1,nj); free_vector(sumi,1,ni); } Measures of Association Based on Entropy Consider the game of “twenty questions,” where by repeated yes/no questions you try to eliminate all except one correct possibility for an unknown object. Better yet, consider a generalization of the game, where you are allowed to ask multiple choice questions as well as binary (yes/no) ones. The categories in your multiple choice questions are supposed to be mutually exclusive and exhaustive (as are “yes” and “no”). The value to you of an answer increases with the number of possibilities that it eliminates. More specifically, an answer that eliminates all except a fraction p of the remaining possibilities can be assigned a value − ln p (a positive number, since p < 1). The purpose of the logarithm is to make the value additive, since (e.g.) one question that eliminates all but 1/6 of the possibilities is considered as good as two questions that, in sequence, reduce the number by factors 1/2 and 1/3. So that is the value of an answer; but what is the value of a question? If there are I possible answers to the question (i = 1,...,I) and the fraction of possibilities consistent with the ith answer is pi (with the sum of the pi’s equal to one), then the value of the question is the expectation value of the value of the answer, denoted H, H = −  I i=1 pi ln pi (14.4.6) In evaluating (14.4.6), note that limp→0 p ln p =0 (14.4.7) The value H lies between 0 and ln I. It is zero only when one of the p i’s is one, all the others zero: In this case, the question is valueless, since its answer is preordained
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有