复旦大学：《概率论》精品课程教学资源（习题答案）林德贝格-费勒中心极限定理的概率证明 A Probabilistic Proof of the Lindeberg-Feller.pdf_大学文库

is the standardization of a sum Sn, as in (1), of independent and identically distributed random variables each with mean µ and variance σ 2 . From this generalization it now becomes somewhat clearer why various distributions observed in nature, which may not be at all related to the binomial, such as the errors of measurement averages, or the heights of individuals in a sample, take on the bell shaped form: each observation is the result of summing many small independent factors. A further extension of the classical CLT could yet come. In situations where the summand distributions do not have identical distributions, can the normal curve still govern? For an example, consider the symmetric group Sn, the set of all permutations π on the set {1, 2, . . . , n}. We can represent π ∈ S7, for example, by two line notation π = 1 2 3 4 5 6 7 4 3 7 6 5 1 2 from which one can read that π(1) = 4 and π(4) = 6. This permutation can also be represented in the cycle notation π = (1, 4, 6)(2, 3, 7)(5) with the meaning that π maps 1 to 4, 4 to 6, 6 to 1, and so forth. From the cycle representation we see that π has two cycles of length 3 and one of length 1, for a total of three cycles. In general, let Kn(π) denote the total number of cycles in a permutation π ∈ Sn. If π is chosen uniformly from all the n! permutations in Sn, does the Central Limit Theorem imply that Kn(π) is approximately normally distributed for large n? To answer this question we will employ the Feller coupling [3], which constructs a random permutation π uniformly from Sn with the help of n independent Bernoulli variables X1, . . . , Xn with distributions P(Xi = 0) = 1 − 1 i and P(Xi = 1) = 1 i , i = 1, . . . , n. (4) Begin the first cycle at stage 1 with the element 1. At stage i, i = 1, . . . , n, if Xn−i+1 = 1 close the current cycle and begin a new one starting with the smallest number not yet in any cycle, and otherwise choose an element uniformly from those yet unused and place it to the right of the last element in the current cycle. In this way at stage i we complete a cycle with probability 3

Of course, even under Condition 1.1, some further assumptions must be sat- isfied by the summand variables for the normal convergence(2) For instance, if the first variable accounts for some non-vanishing fraction of the total variability, it will strongly influence the limiting distribution, possi bly resulting in non-normal convergence. The Lindeberg-Feller central limit theorem, see], says that normal convergence(2)holds upon ruling out such situations by imposing the Lindeberg Condition VE>0 lim Ln, c=0 where In. =>EXi I(Xi, nI>e))(8) where for an event A, the indicator'random variable 1(A)takes on the value 1 if A occurs and the value 0 otherwise. Once known to be sufficient the Lindeberg condition was proved to be partially necessary by Feller and Levy, independently; see 8 for history. The appearance of the Lindeberg Condition is justified by explanations such as the one given by Feller [4], who roughly says that it requires the individual variances be due mainly to masses in an interval whose length is small in comparison to the overall variance. We present a probabilistic condition which is seemingly simpler, yet equivalent Our probabilistic approach to the Clt is through the so called zero bias transformation introduced in [6. For every distribution with mean zero, and finite non-zero variance a2 on a random variable x the zero bias transfor mation returns the unique X-zero biased distribution'on X* which satisfies o2Ef(X*)=ELXJ(X) for all absolutely continuous functions f for which these expectations exist The existence of a strong connection between the zero bias transformation and the normal distribution is made clear by the characterization of Stein 91, which implies that X' and X have the same distribution if and only if X has the w(0, a2)distribution, that is, that the normal distribution is the zero bias transformations unique fixed point the ne way to see the "if direction of Steins characterization, that is,why zero bias transformation fixes the normal, is to note that the density function o2(a)=o-p(o-r) of a N(, o)variable, with p(a) given by (3), satisfies the differential equation with a form"conjugate'to( 9) 5

Of course, even under Condition 1.1, some further assumptions must be satisfied by the summand variables for the normal convergence (2) to take place. For instance, if the first variable accounts for some non-vanishing fraction of the total variability, it will strongly influence the limiting distribution, possibly resulting in non-normal convergence. The Lindeberg-Feller central limit theorem, see [4], says that normal convergence (2) holds upon ruling out such situations by imposing the Lindeberg Condition ∀ > 0 limn→∞ Ln, = 0 where Ln, = Xn i=1 E{X 2 i,n1(|Xi,n| ≥ )} (8) where for an event A, the ‘indicator’ random variable 1(A) takes on the value 1 if A occurs, and the value 0 otherwise. Once known to be sufficient, the Lindeberg condition was proved to be partially necessary by Feller and L´evy, independently; see [8] for history. The appearance of the Lindeberg Condition is justified by explanations such as the one given by Feller [4], who roughly says that it requires the individual variances be due mainly to masses in an interval whose length is small in comparison to the overall variance. We present a probabilistic condition which is seemingly simpler, yet equivalent. Our probabilistic approach to the CLT is through the so called zero bias transformation introduced in [6]. For every distribution with mean zero, and finite non-zero variance σ 2 on a random variable X, the zero bias transformation returns the unique ‘X-zero biased distribution’ on X∗ which satisfies σ 2Ef0 (X ∗ ) = E[Xf(X)] (9) for all absolutely continuous functions f for which these expectations exist. The existence of a strong connection between the zero bias transformation and the normal distribution is made clear by the characterization of Stein [9], which implies that X∗ and X have the same distribution if and only if X has the N (0, σ2 ) distribution, that is, that the normal distribution is the zero bias transformation’s unique fixed point. One way to see the ‘if’ direction of Stein’s characterization, that is, why the zero bias transformation fixes the normal, is to note that the density function ϕσ2 (x) = σ −1ϕ(σ −1x) of a N (0, σ2 ) variable, with ϕ(x) given by (3), satisfies the differential equation with a form ‘conjugate’ to (9), σ 2ϕ 0 σ2 (x) = −xϕσ2 (x), 5

and now(9), with X*=X, follows for a large class of functions f by inte- gration by parts We can gain some additional intuition regarding the zero bias transfor mation by observing its action on non-normal distributions, which, in some sense, moves them closer to normality. Let b be a bernoulli random variable with success probability P E(0, 1), and let ua, b denote the uniform distri bution on the finite interval [a, b]. Centering B to form the mean zero discrete random variable X=B-p having variance o=p(1-P), substitution into he right hand side of (9) yields EIXf(XJ= E(B-P)f(B-p p(1-p)f(1-p)-(1-p)pf(-p) 2[f(1-p)-f(-p f(u) f(U), for U having uniform density over [-p, 1-Pl. Hence, with=d indicating the equality of two random variables in distribution (B-p=dU where U has distribution Z-p, 1-pI This example highlights the general fact that the distribution of X* is always absolutely continuous, regardless of the nature of the distribution of X It is the uniqueness of the fixed point of the zero bias transformation, that is, the fact that X* has the same distribution as X only when X is normal that provides the probabilistic reason behind the Clt. This only if'direction of Steins characterization suggests that a distribution which gets mapped to one nearby is close to being a fixed point of the zero bias transformation, and therefore must be close to the transformation's only fixed point, the normal Hence the normal approximation should apply whenever the distribution of a random variable is close to that of its zero bias transformation Moreover, the zero bias transformation has a special property that imme- diately shows why the distribution of a sum Wn of comparably sized inde- pendent random variables is close to that of Wr: a sum of independent terms can be zero biased by replacing a single summand chosen proportionally to its variance and replacing it with one of comparable size. Thus, by differ ing only in a single summand, the variables Wn and W are close, making

and now (9), with X∗ = X, follows for a large class of functions f by integration by parts. We can gain some additional intuition regarding the zero bias transformation by observing its action on non-normal distributions, which, in some sense, moves them closer to normality. Let B be a Bernoulli random variable with success probability p ∈ (0, 1), and let U[a, b] denote the uniform distribution on the finite interval [a, b]. Centering B to form the mean zero discrete random variable X = B − p having variance σ 2 = p(1 − p), substitution into the right hand side of (9) yields E[Xf(X)] = E[(B − p)f(B − p)] = p(1 − p)f(1 − p) − (1 − p)pf(−p) = σ 2 [f(1 − p) − f(−p)] = σ 2 Z 1−p −p f 0 (u)du = σ 2Ef0 (U), for U having uniform density over [−p, 1 − p]. Hence, with =d indicating the equality of two random variables in distribution, (B − p) ∗ =d U where U has distribution U[−p, 1 − p]. (10) This example highlights the general fact that the distribution of X∗ is always absolutely continuous, regardless of the nature of the distribution of X. It is the uniqueness of the fixed point of the zero bias transformation, that is, the fact that X∗ has the same distribution as X only when X is normal, that provides the probabilistic reason behind the CLT. This ‘only if’ direction of Stein’s characterization suggests that a distribution which gets mapped to one nearby is close to being a fixed point of the zero bias transformation, and therefore must be close to the transformation’s only fixed point, the normal. Hence the normal approximation should apply whenever the distribution of a random variable is close to that of its zero bias transformation. Moreover, the zero bias transformation has a special property that immediately shows why the distribution of a sum Wn of comparably sized independent random variables is close to that of W∗ n : a sum of independent terms can be zero biased by replacing a single summand chosen proportionally to its variance and replacing it with one of comparable size. Thus, by differing only in a single summand, the variables Wn and W∗ n are close, making 6