14.3 Are Two Distributions Different?_中国高校课件下载中心

点击下载：《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 14.3

正在加载图片...

14.3 Are Two Distributions Different? 627 Unfortunately,there is no simple formula analogous to equations (14.3.7)and (14.3.9)for this statistic,although Noe [5]gives a computational method using a recursion relation and provides a graph of numerical results.There are many other possible similar statistics,for example D*= SN四-P(aP( JP=oP(x1-P(x订1 (14.3.12) which is also discussed by Anderson and Darling (see[3)). Another approach,which we prefer as simpler and more direct,is due to Kuiper[6.7]. We already mentioned that the standard K-S test is invariant under reparametrizations of the variable An even more general symmetry,which guarantees equal sensitivities at all values of r,is to wrap the c axis around into a circle (identifying the points at too),and to look for a statistic that is now invariant under all shifts and parametrizations on the circle.This allows. 81 for example,a probability distribution to be"cut"at some central value of,and the left and right halves to be interchanged,without altering the statistic or its significance. Kuiper's statistic,defined as V=D,+D.=-毁[Sv国-Pa〗+-xP-Swa (14.3.13) is the sum of the maximum distance of SN(x)above and below P(x).You should be able to convince yourself that this statistic has the desired invariance on the circle:Sketch the indefinite integral of two probability distributions defined on the circle as a function of angle RECIPES around the circle,as the angle goes through several times 360.If you change the starting point of the integration,D+and D-change individually,but their sum is constant. Furthermore,there is a simple formula for the asymptotic distribution of the statistic V, directly analogous to equations (14.3.7)(14.3.10).Let QKP(A)=2(422-1)e-22A2 (14.3.14) j=1 which is monotonic and satisfies QKP(0)=1 QKP(0)=0 (14.3.15) In terms of this function the significance level is [1] Probability (V>observed )=QKPN+0.155+0.24/NV (14.3.16) Here Ne is N in the one-sample case,or is given by equation (14.3.10)in the case of two samples. Of course,Kuiper's test is ideal for any problem originally defined on a circle,for example,to test whether the distribution in longitude of something agrees with some theory, 10621 or whether two somethings have different distributions in longitude.(See also [8].) Numerica We will leave to you the coding of routines analogous to ksone,kstwo,and probks, 43106 above.(For A<0.4,don't try to do the sum 14.3.14.Its value is 1,to 7 figures,but the series can require many terms to converge,and loses accuracy to roundoff.) Two final cautionary notes:First,we should mention that all varieties of K-S test lack the ability to discriminate some kinds of distributions.A simple example is a probability distribution with a narrow "notch"within which the probability falls to zero.Such a Software. distribution is of course ruled out by the existence of even one data point within the notch, but,because of its cumulative nature,a K-S test would require many data points in the notch before signaling a discrepancy. Second,we should note that,if you estimate any parameters from a data set (e.g.,a mean and variance),then the distribution of the K-S statistic D for a cumulative distribution function P()that uses the estimated parameters is no longer given by equation (14.3.9).In general, you will have to determine the new distribution yourself,e.g.,by Monte Carlo methods. CITED REFERENCES AND FURTHER READING: von Mises,R.1964,Mathematical Theory of Probability and Statistics (New York:Academic Press),Chapters IX(C)and IX(E).14.3 Are Two Distributions Different? 627 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Unfortunately, there is no simple formula analogous to equations (14.3.7) and (14.3.9) for this statistic, although Noe´ [5] gives a computational method using a recursion relation and provides a graph of numerical results. There are many other possible similar statistics, for example D** = 1 P =0 [SN (x) − P(x)]2 P(x)[1 − P(x)] dP(x) (14.3.12) which is also discussed by Anderson and Darling (see [3]). Another approach, which we prefer as simpler and more direct, is due to Kuiper[6,7]. We already mentioned that the standard K–S test is invariant under reparametrizations of the variable x. An even more general symmetry, which guarantees equal sensitivities at all values of x, is to wrap the x axis around into a circle (identifying the points at ±∞), and to look for a statistic that is now invariant under all shifts and parametrizations on the circle. This allows, for example, a probability distribution to be “cut” at some central value of x, and the left and right halves to be interchanged, without altering the statistic or its significance. Kuiper’s statistic, defined as V = D+ + D− = max −∞<x<∞ [SN (x) − P(x)] + max −∞<x<∞ [P(x) − SN (x)] (14.3.13) is the sum of the maximum distance of SN (x) above and below P(x). You should be able to convince yourself that this statistic has the desired invariance on the circle: Sketch the indefinite integral of two probability distributions defined on the circle as a function of angle around the circle, as the angle goes through several times 360◦. If you change the starting point of the integration, D+ and D− change individually, but their sum is constant. Furthermore, there is a simple formula for the asymptotic distribution of the statistic V , directly analogous to equations (14.3.7)–(14.3.10). Let QKP (λ)=2∞ j=1 (4j 2 λ2 − 1)e−2j2λ2 (14.3.14) which is monotonic and satisfies QKP (0) = 1 QKP (∞)=0 (14.3.15) In terms of this function the significance level is [1] Probability (V > observed ) = QKP √Ne + 0.155 + 0.24/ √Ne V (14.3.16) Here Ne is N in the one-sample case, or is given by equation (14.3.10) in the case of two samples. Of course, Kuiper’s test is ideal for any problem originally defined on a circle, for example, to test whether the distribution in longitude of something agrees with some theory, or whether two somethings have different distributions in longitude. (See also [8].) We will leave to you the coding of routines analogous to ksone, kstwo, and probks, above. (For λ < 0.4, don’t try to do the sum 14.3.14. Its value is 1, to 7 figures, but the series can require many terms to converge, and loses accuracy to roundoff.) Two final cautionary notes: First, we should mention that all varieties of K–S test lack the ability to discriminate some kinds of distributions. A simple example is a probability distribution with a narrow “notch” within which the probability falls to zero. Such a distribution is of course ruled out by the existence of even one data point within the notch, but, because of its cumulative nature, a K–S test would require many data points in the notch before signaling a discrepancy. Second, we should note that, if you estimate any parameters from a data set (e.g., a mean and variance), then the distribution of the K–S statistic D for a cumulative distribution function P(x) that uses the estimated parameters is no longer given by equation (14.3.9). In general, you will have to determine the new distribution yourself, e.g., by Monte Carlo methods. CITED REFERENCES AND FURTHER READING: von Mises, R. 1964, Mathematical Theory of Probability and Statistics (New York: Academic Press), Chapters IX(C) and IX(E)

<<向上翻页向下翻页>>

点击下载：《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 14.3