《实用非参数统计》课程教学资源（阅读材料）一些理论 Bootstrap and Jackknife Estimation of Sampling Distributions.pdf_大学文库

Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1 A General view of the bootstrap We begin with a general approach to bootstrap methods. The goal is to formulate the ideas in a context which is free of particular model assumptions. Suppose that the data X ∼ Pθ ∈ P = {Pθ : θ ∈ Θ}. The parameter space Θ is allowed to be very general; it could be a subset of R k (in which case the model P is a parametric model), or it could be the distributions of all i.i.d. sequences on some measurable space (X , A) (in which case the model P is the “nonparametric i.i.d.” model). Suppose that we have an estimator ˆθ of θ ∈ Θ, and thereby an estimator Pθˆ of Pθ. Consider estimation of: A. The distribution of ˆθ: e.g. Pθ( ˆθ ∈ A) = Pθ( ˆθ(X) ∈ A) for a measurable subset A of Θ; B. If Θ ⊂ R k , V arθ(a T ˆθ(X)) for a fixed vector a ∈ R k . Natural (ideal) bootstrap estimators of these parameters are provided by: A0 . Pθˆ( ˆθ(X∗ ) ∈ A); B0 . V arθˆ(a T ˆθ(X∗ )). While these ideal bootstrap estimators are often difficult to compute exactly, we can often obtain Monte-Carlo estimates thereof by sampling fromm Pθˆ : let X∗ 1 , . . . , X∗ B be i.i.d. with common distribution Pθˆ, and calculate ˆθ(X∗ j ) for j = 1, . . . , B. Then Monte-Carlo approximations (or implementations) of the bootstrap estimators in A’ and B’ are given by A00 . B−1 PB j=1 1{ ˆθ(X∗ j ) ∈ A}; B00 . B−1 PB j=1(a T ˆθ(X∗ j ) − B−1 PB j=1 a T ˆθ(X∗ j ))2 . If P is a parametric model, the above approach yields a parametric bootstrap. If P is a nonparametric model, then this yields a nonparametric bootstrap. In the following section, we try to make these ideas more concrete first in the context of X = (X1, . . . , Xn) i.i.d. F or P with P nonparametric so that Pθ = F × · · · × F and Pθˆ = Fn × · · · × Fn. Or, if the basic underlying sample space for each Xi is not R, Pθ = P × · · · × P and Pθˆ = Pn × · · · × Pn. 3

4CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 2 Bootstrap Methods We begin with a discussion of Efron’s nonparametric bootstrap; we will then discuss some of the many alternatives. Efron’s nonparametric bootstrap Suppose that T(F) is some (real-valued) functional of F. If X1, . . . , Xn are i.i.d. with distribution function F, then we estimate T(F) by T(Fn) ≡ Tn where Fn is the empirical d.f. Fn ≡ n −1 Pn i=1 1{Xi ≤ x}. More generally, if T(P) is some functional of P and X1, . . . , Xn are i.i.d. P, then a natural estimator of T(P) is just T(Pn) where Pn is the empirical measure Pn = n −1 Pn i=1 δXi . Consider estimation of: A. bn(F) ≡ n{EF (Tn) − T(F)}. B. nσ2 n (F) ≡ nV arF (Tn). C. κ3,n(F) ≡ EF [Tn − EF (Tn)]3/σ3 n (F). D. Hn(x, F) ≡ PF ( √ n(Tn − T(F)) ≤ x). E. Kn(x, F) ≡ PF ( √ nkFn − Fk∞ ≤ x). F. Ln(x, P) ≡ P rP ( √ nkPn − PkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). The (ideal) nonparametric bootstrap estimates of these quantities are obtained simply via the substitution principle: if F (or P) is unknown, estimate it by the empirical distribution function Fn (or the empirical measure Pn). This yields the following nonparametric bootstrap estimates in examples A - F: A0 . bn(Fn) ≡ n{EFn (Tn) − T(Fn)}. B0 . nσ2 n (Fn) ≡ nV arFn (Tn). C0 . κ3,n(Fn) ≡ EFn [Tn − EFn (Tn)]3/σ3 n (Fn). D0 . Hn(x, Fn) ≡ PFn ( √ n(Tn − T(Fn)) ≤ x). E0 . Kn(x, Fn) ≡ PFn ( √ nkF ∗ n − Fnk∞ ≤ x). F 0 . Ln(x, Pn) ≡ P rPn ( √ nkP ∗ n − PnkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). Because we usually lack closed - form expressions for the ideal bootstrap estimators in A0 - F0 , evaluation of A0 - F0 is usually indirect. Since the empirical d.f. Fn is discrete (with all its mass at the data), we could, in principle enumerate all possible samples of size n from Fn (or Pn) with replacement. If n is large, this is a large number, however: n n . [Problem: show that the number of distinct bootstrap samples is 2n−1 n .] On the other hand, Monte-Carlo approximations to A0 − F 0 are easy: let (X∗ j1 , . . . , X∗ jn) j = 1, . . . , B

2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to)the bootstrap estimate of on(F)is B B-1m-p2. 1=1 Finally the jackknife estimate of on(F)is n- n -2： 11 see the beginning of section 2 for the notation used here.We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping(sampling from the empirical measure Pn)be- comes clear,it seems natural to consider sampling from other estimators of the unknown P.For example,if we are quite confident that some parametric model holds,then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model.Here is a formal description of this type of model-based bootstrap procedure. Let (A)be a measurable space,and let P={P:0e}be a model,parametric,semi- parametric or nonparametric.We do not insist that e be finite-dimensional.For example, in a parametric extreme case p could be the family of all normal (Gaussian)distributions on (,A)=(R4,Bd).Or,to give a nonparametric example with only a smoothness restriction,P could be the family of all distributions on(,A)=(Ra,Bd)with a density with respect to Lebesgue measure which is uniformly continuous. Let X1,...,Xn,...be i.i.d.with distribution PE P.We assume that there exists an estimator =(X1,...,Xn)of.Then Efron's parametric (or model-based)bootstrap proceeds by sam- pling from the estimated or fitted model P=P:suppose that ,..are independent and identically distributed with distribution P on (,A),and let (1) =the parametric bootstrap empirical measure. i=1 The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model-based estimator P=p of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1,...,Xn are i.i.d.Po=N(u,o2)where =(u,o2).Let on= (n,)=(n:2)where 2 is the usual unbiased estimator of o2,and hence n(an-四~tn-, On -)品心xX- 2 Now P=N(),and ifiare i.i.d.P then the bootstrap estimators=(2) satisfy,conditionally on Fn, Vn(inin)~tn-1, 壳 u-1)2~X2-r 6 Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case

2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to) the bootstrap estimate of σn(F) is vuutB−1X B j=1 [ρb ∗ j − ρ ∗] 2. Finally the jackknife estimate of σn(F) is vuut n − 1 n Xn j=1 [ρb(i) − ρb(·) ] 2; see the beginning of section 2 for the notation used here. We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping (sampling from the empirical measure Pn) becomes clear, it seems natural to consider sampling from other estimators of the unknown P. For example, if we are quite confident that some parametric model holds, then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model. Here is a formal description of this type of model - based bootstrap procedure. Let (X , A) be a measurable space, and let P = {Pθ : θ ∈ Θ} be a model, parametric, semiparametric or nonparametric. We do not insist that Θ be finite - dimensional. For example, in a parametric extreme case P could be the family of all normal (Gaussian) distributions on (X , A) = (R d , B d ). Or, to give a nonparametric example with only a smoothness restriction, P could be the family of all distributions on (X , A) = (R d , B d ) with a density with respect to Lebesgue measure which is uniformly continuous. Let X1, . . . , Xn, . . . be i.i.d. with distribution Pθ ∈ P. We assume that there exists an estimator ˆθn = ˆθn(X1, . . . , Xn) of θ. Then Efron’s parametric (or model - based) bootstrap proceeds by sampling from the estimated or fitted model Pθˆ(ω) ≡ Pˆω n : suppose that X∗ n,1 , . . . , X∗ n,n are independent and identically distributed with distribution Pˆω n on (X , A), and let P ∗ n ≡ n −1Xn i=1 δX∗ n,i (1) ≡ the parametric bootstrap empirical measure . The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model - based estimator Pˆ n = pθˆn of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1, . . . , Xn are i.i.d. Pθ = N(µ, σ2 ) where θ = (µ, σ2 ). Let ˆθn = (ˆµn, σˆ 2 n ) = (Xn, S2 n ) where S 2 n is the usual unbiased estimator of σ 2 , and hence √ n(ˆµn − µ) σˆn ∼ tn−1, (n − 1)ˆσ 2 n σ 2 ∼ χ 2 n−1 . Now Pθˆn = N(ˆµn, σˆ 2 n ), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then the bootstrap estimators ˆθ ∗ n = (ˆµ ∗ n , σˆ ∗2 n ) satisfy, conditionally on Fn, √ n(ˆµ ∗ n − µˆn) σˆ ∗ n ∼ tn−1, (n − 1)ˆσ ∗2 n σˆ 2 n ∼ χ 2 n−1 . Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case

8CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1,...,Xn are i.i.d.Po=exponential(1/0):Po(X1 >t)=exp(-t/0) for t 0.Then n =Xn and non/0 Gamma(n,1).Now Pi =exponential(1/n),and if Xi,...,are i.i.d.P,then n=n has (non/nn)Gamma(n,1),so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a "smoothed empirical measure";or the "smoothed boot- strap”).Suppose that P={P on (Ra,Bd):p= dp d入 exists and is uniformly continuous. Then one way to estimate P so that our estimator PnE P is via a kernel estimator of the density p: in(d）=i ∫() dPn(y) where k:Rd->R is a uniformly continuous density function.Then Pn is defined for CA by n(C)= pn(x)dx, and the model-based bootstrap proceeds by sampling from Pn There are many other examples of this type involving nonparametric or semiparametric models P.For some work on "smoothed bootstrap"methods see e.g.Silverman and Young (1987)and Hall,DiCiccio,and Romano (1989). Exchangeably-weighted and "Bayesian"bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X;equal the observations Xi(w)in the underlying sample.Thinking about the process of sampling at random (with replacement)from the population described by the empirical measure Pn,it becomes clear that we can think of the bootstrap empirical measure P as the empirical measure with multinomial random weights: P= 1∑x:= M:6x:(@) i= This view of Efron's nonparametric bootstrap as the empirical measure with random weights sug- gests that we could obtain other random measures which would behave much the same way as Efron's nonparametric bootstrap,but without the same random sampling interpretation,by re- placing the vector of multinomial weights by some other random vector W.One of the possible deficiencies of the nonparametric bootstrap involves its "discreteness"via missing observations in the original sample:note that the number of points of the original sample which are missed (or not given any bootstrap weight)is Nn=#jn:M=0)=>11{M=0).hence the proportion of observations missed by the bootstrap is n-1Nn,and the expected number proportion of missed observations is E(n-1Nn)=P(M=0)=(1-1/n)”→e-1=.36787.…

8CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1, . . . , Xn are i.i.d. Pθ = exponential(1/θ): Pθ(X1 > t) = exp(−t/θ) for t ≥ 0. Then ˆθn = Xn and nˆθn/θ ∼ Gamma(n, 1). Now Pθˆn = exponential(1/ ˆθn), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then ˆθ ∗ n = X ∗ n has (nˆθ ∗ n/ ˆθn|Fn) ∼ Gamma(n, 1), so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a “smoothed empirical measure”; or the “smoothed bootstrap”). Suppose that P = {P on (R d , B d ) : p = dP dλ exists and is uniformly continuous}. Then one way to estimate P so that our estimator Pˆ n ∈ P is via a kernel estimator of the density p: pˆn(x) = 1 b d n Z k y − x bn dPn(y) where k : R d → R is a uniformly continuous density function. Then Pˆ n is defined for C ∈ A by Pˆ n(C) = Z C pˆn(x)dx, and the model- based bootstrap proceeds by sampling from Pˆ n. There are many other examples of this type involving nonparametric or semiparametric models P. For some work on “smoothed bootstrap” methods see e.g. Silverman and Young (1987) and Hall, DiCiccio, and Romano (1989). Exchangeably - weighted and “Bayesian” bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X∗ i equal the observations Xj (ω) in the underlying sample. Thinking about the process of sampling at random (with replacement) from the population described by the empirical measure Pn, it becomes clear that we can think of the bootstrap empirical measure P ∗ n as the empirical measure with multinomial random weights: P ∗ n = 1 n Xn i=1 δX∗ i = 1 n Xn i=1 MiδXi(ω) . This view of Efron’s nonparametric bootstrap as the empirical measure with random weights suggests that we could obtain other random measures which would behave much the same way as Efron’s nonparametric bootstrap, but without the same random sampling interpretation, by replacing the vector of multinomial weights by some other random vector W. One of the possible deficiencies of the nonparametric bootstrap involves its “discreteness” via missing observations in the original sample: note that the number of points of the original sample which are missed (or not given any bootstrap weight) is Nn ≡ #{j ≤ n : Mj = 0} = Pn j=1 1{Mj = 0}. hence the proportion of observations missed by the bootstrap is n −1Nn, and the expected number proportion of missed observations is E(n −1Nn) = P(M1 = 0) = (1 − 1/n) n → e −1 ˙=.36787 . . .