14.3 Are Two Distributions Different?_中国高校课件下载中心

点击下载：《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 14.3

正在加载图片...

14.3 Are Two Distributions Different? 623 Equation (14.3.2)and the routine chstwo both apply to the case where the total number of data points is the same in the two binned sets.For unequal numbers of data points,the formula analogous to (14.3.2)is X2= (VS/RR-VR/SS) (14.3.3) R1+S: where 三 R≡∑R: S=∑S (14.3.4) are the respective numbers of data points.It is straightforward to make the 公 corresponding change in chstwo. Kolmogorov-Smirnov Test 令 2 The Kolmogorov-Smirnov (or K-S)test is applicable to unbinned distributions that are functions of a single independent variable,that is,to data sets where each data point can be associated with a single number(lifetime of each lightbulb when Press. it burns out,or declination of each star).In such cases,the list of data points can be easily converted to an unbiased estimator S()of the cumulative distribution function of the probability distribution from which it was drawn:If the N events are located at values i,i=1,...,N,then SN()is the function giving the fraction of data points to the left of a given value z.This function is obviously constant SCIENTIFIC( between consecutive(i.e.,sorted into ascending order)zi's,and jumps by the same constant 1/N at each xi.(See Figure 14.3.1.) 61 Different distribution functions,or sets of data,give different cumulative distribution function estimates by the above procedure.However,all cumulative distribution functions agree at the smallest allowable value of z(where they are zero),and at the largest allowable value of(where they are unity).(The smallest and largest values might of course be too.)So it is the behavior between the largest 10.621 and smallest values that distinguishes distributions. Numerica One can think of any number of statistics to measure the overall difference 431 between two cumulative distribution functions:the absolute value ofthe area between (outside Recipes them,for example.Or their integrated mean square difference.The Kolmogorov- Smirnov D is a particularly simple measure:It is defined as the maximum value of the absolute difference between two cumulative distribution functions.Thus, North for comparing one data set's SN()to a known cumulative distribution function P(x),the K-S statistic is D=-aISN()-P川 (14.3.5) while for comparing two different cumulative distribution functions SN,(x)and SNa(x),the K-S statistic is D=-xISN,(e)-SN(z川 (14.3.6)14.3 Are Two Distributions Different? 623 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Equation (14.3.2) and the routine chstwo both apply to the case where the total number of data points is the same in the two binned sets. For unequal numbers of data points, the formula analogous to (14.3.2) is χ2 = i ( S/RRi − R/SSi)2 Ri + Si (14.3.3) where R ≡ i Ri S ≡ i Si (14.3.4) are the respective numbers of data points. It is straightforward to make the corresponding change in chstwo. Kolmogorov-Smirnov Test The Kolmogorov-Smirnov (or K–S) test is applicable to unbinned distributions that are functions of a single independent variable, that is, to data sets where each data point can be associated with a single number (lifetime of each lightbulb when it burns out, or declination of each star). In such cases, the list of data points can be easily converted to an unbiased estimator SN (x) of the cumulative distribution function of the probability distribution from which it was drawn: If the N events are located at values xi, i = 1,...,N, then SN (x) is the function giving the fraction of data points to the left of a given value x. This function is obviously constant between consecutive (i.e., sorted into ascending order) xi’s, and jumps by the same constant 1/N at each xi. (See Figure 14.3.1.) Different distribution functions, or sets of data, give different cumulative distribution function estimates by the above procedure. However, all cumulative distribution functions agree at the smallest allowable value of x (where they are zero), and at the largest allowable value of x (where they are unity). (The smallest and largest values might of course be ±∞.) So it is the behavior between the largest and smallest values that distinguishes distributions. One can think of any number of statistics to measure the overall difference between two cumulative distribution functions: the absolute value of the area between them, for example. Or their integrated mean square difference. The KolmogorovSmirnov D is a particularly simple measure: It is defined as the maximum value of the absolute difference between two cumulative distribution functions. Thus, for comparing one data set’s SN (x) to a known cumulative distribution function P(x), the K–S statistic is D = max −∞<x<∞ |SN (x) − P(x)| (14.3.5) while for comparing two different cumulative distribution functions S N1 (x) and SN2 (x), the K–S statistic is D = max −∞<x<∞ |SN1 (x) − SN2 (x)| (14.3.6)

<<向上翻页向下翻页>>

点击下载：《数字信号处理》教学参考资料（Numerical Recipes in C，The Art of Scientific Computing Second Edition）Chapter 14.3