1 Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1.A General View of the Bootstrap 2.Bootstrap Methods 3.The Jackknife 4.Some limit theory for bootstrap methods 5.The bootstrap and the delta method 6.Bootstrap Tests and Bootstrap Confidence Intervals 7.M-Estimators and the Bootstrap
1 Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1. A General View of the Bootstrap 2. Bootstrap Methods 3. The Jackknife 4. Some limit theory for bootstrap methods 5. The bootstrap and the delta method 6. Bootstrap Tests and Bootstrap Confidence Intervals 7. M - Estimators and the Bootstrap
Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1 A General view of the bootstrap We begin with a general approach to bootstrap methods.The goal is to formulate the ideas in a context which is free of particular model assumptions. Suppose that the data x~Pa∈P={Pg:B∈曰}.The parameter space曰is allowed to be very general;it could be a subset of R(in which case the model P is a parametric model),or it could be the distributions of all i.i.d.sequences on some measurable space (Y,A)(in which case the model P is the "nonparametric i.i.d."model). Suppose that we have an estimator 0 of 0,and thereby an estimator P of Pa.Consider estimation of: A.The distribution of 0:e.g.P(0 A)=Po(0(X)EA)for a measurable subset A of e; B.f日cRk,Vara(gT(X)for a fixed vector a∈Rk Natural (ideal)bootstrap estimators of these parameters are provided by: A'.Pa(0(X*)∈A): B'.Varo(aT0(X*)). While these ideal bootstrap estimators are often difficult to compute exactly,we can often obtain Monte-Carlo estimates thereof by sampling fromm P:let Xi,...,X be i.i.d.with common distribution P,and calculate 0(X;)for j=1,...,B.Then Monte-Carlo approximations (or implementations)of the bootstrap estimators in A'and B'are given by A".B-1∑B11{X)∈A: B”.B-1∑Ba6X)-B-1∑B1TX》2 If p is a parametric model,the above approach yields a parametric bootstrap.If P is a nonparametric model,then this yields a nonparametric bootstrap.In the following section,we try to make these ideas more concrete first in the context of X =(X1,...,Xn)i.i.d.F or P with P nonparametric so that Po=Fx...x F and P=Fn x...x Fn.Or,if the basic underlying sample space for each Xi is not R,Pa=P×…×P and Pa=PnX·×Pn
Chapter 8 Bootstrap and Jackknife Estimation of Sampling Distributions 1 A General view of the bootstrap We begin with a general approach to bootstrap methods. The goal is to formulate the ideas in a context which is free of particular model assumptions. Suppose that the data X ∼ Pθ ∈ P = {Pθ : θ ∈ Θ}. The parameter space Θ is allowed to be very general; it could be a subset of R k (in which case the model P is a parametric model), or it could be the distributions of all i.i.d. sequences on some measurable space (X , A) (in which case the model P is the “nonparametric i.i.d.” model). Suppose that we have an estimator ˆθ of θ ∈ Θ, and thereby an estimator Pθˆ of Pθ. Consider estimation of: A. The distribution of ˆθ: e.g. Pθ( ˆθ ∈ A) = Pθ( ˆθ(X) ∈ A) for a measurable subset A of Θ; B. If Θ ⊂ R k , V arθ(a T ˆθ(X)) for a fixed vector a ∈ R k . Natural (ideal) bootstrap estimators of these parameters are provided by: A0 . Pθˆ( ˆθ(X∗ ) ∈ A); B0 . V arθˆ(a T ˆθ(X∗ )). While these ideal bootstrap estimators are often difficult to compute exactly, we can often obtain Monte-Carlo estimates thereof by sampling fromm Pθˆ : let X∗ 1 , . . . , X∗ B be i.i.d. with common distribution Pθˆ, and calculate ˆθ(X∗ j ) for j = 1, . . . , B. Then Monte-Carlo approximations (or implementations) of the bootstrap estimators in A’ and B’ are given by A00 . B−1 PB j=1 1{ ˆθ(X∗ j ) ∈ A}; B00 . B−1 PB j=1(a T ˆθ(X∗ j ) − B−1 PB j=1 a T ˆθ(X∗ j ))2 . If P is a parametric model, the above approach yields a parametric bootstrap. If P is a nonparametric model, then this yields a nonparametric bootstrap. In the following section, we try to make these ideas more concrete first in the context of X = (X1, . . . , Xn) i.i.d. F or P with P nonparametric so that Pθ = F × · · · × F and Pθˆ = Fn × · · · × Fn. Or, if the basic underlying sample space for each Xi is not R, Pθ = P × · · · × P and Pθˆ = Pn × · · · × Pn. 3
4CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 2 Bootstrap Methods We begin with a discussion of Efron's nonparametric bootstrap;we will then discuss some of the many alternatives. Efron's nonparametric bootstrap Suppose that T(F)is some (real-valued)functional of F.If X1,...,Xn are i.i.d.with dis- tribution function F,then we estimate T(F)by T(Fn)=Tn where Fn is the empirical d.f. Fn=n-1 )More generally,if T(P)is some functional of P and X1,...,Xn are i.i.d.P,then a natural estimator of T(P)is just T(Pn)where Pn is the empirical measure Pn=n-1∑g1dx: Consider estimation of: A.bn(F)=nEF(Tn)-T(F). B.no2(F)≡nVarF(Tn). C.K3.n(F)=EF[Tn-EF(Tn)]3/n(F). D.Hn(x,F)=Pr(Vn(Tn-T(F))<). E.Kn(c,F)≡Pr(√nFn-Flo≤x) F.Ln(,P)=Prp(vnPn-Px)where F is a class of functions for which the central limit theorem holds uniformly over F(i.e.a Donsker class). The (ideal)nonparametric bootstrap estimates of these quantities are obtained simply via the substitution principle:if F(or P)is unknown,estimate it by the empirical distribution function Fn(or the empirical measure Pn).This yields the following nonparametric bootstrap estimates in examples A-F: A'.bn(Fn)=nfEEn (Tn)-T(Fn)} B'.noi(Fn)=nVarg,(Tn). C/.K3.n(Fn)=Eg [Tn-EF (Tn)]3/on(Fn). D'.Hn(a,Fn)≡Pgn(m(Tn-T(fn)≤x): E'.Kn(x,Fn)≡Pgn(VFt-Fnlo≤x) F'.Ln(,Pn)=Prp (vnlPh -PnllF x)where F is a class of functions for which the central limit theorem holds uniformly over F(i.e.a Donsker class). Because we usually lack closed-form expressions for the ideal bootstrap estimators in A'-F, evaluation of A'-F is usually indirect.Since the empirical d.f.Fn is discrete (with all its mass at the data),we could,in principle enumerate all possible samples of size n from Fn(or Pn)with replacement.If n is large,this is a large number,however:n".Problem:show that the number of distinct bootstrap samples is(] On the other hand,Monte-Carlo approximations to A'-F are easy:let (X1,,Xjm)j=1,,B
4CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 2 Bootstrap Methods We begin with a discussion of Efron’s nonparametric bootstrap; we will then discuss some of the many alternatives. Efron’s nonparametric bootstrap Suppose that T(F) is some (real-valued) functional of F. If X1, . . . , Xn are i.i.d. with distribution function F, then we estimate T(F) by T(Fn) ≡ Tn where Fn is the empirical d.f. Fn ≡ n −1 Pn i=1 1{Xi ≤ x}. More generally, if T(P) is some functional of P and X1, . . . , Xn are i.i.d. P, then a natural estimator of T(P) is just T(Pn) where Pn is the empirical measure Pn = n −1 Pn i=1 δXi . Consider estimation of: A. bn(F) ≡ n{EF (Tn) − T(F)}. B. nσ2 n (F) ≡ nV arF (Tn). C. κ3,n(F) ≡ EF [Tn − EF (Tn)]3/σ3 n (F). D. Hn(x, F) ≡ PF ( √ n(Tn − T(F)) ≤ x). E. Kn(x, F) ≡ PF ( √ nkFn − Fk∞ ≤ x). F. Ln(x, P) ≡ P rP ( √ nkPn − PkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). The (ideal) nonparametric bootstrap estimates of these quantities are obtained simply via the substitution principle: if F (or P) is unknown, estimate it by the empirical distribution function Fn (or the empirical measure Pn). This yields the following nonparametric bootstrap estimates in examples A - F: A0 . bn(Fn) ≡ n{EFn (Tn) − T(Fn)}. B0 . nσ2 n (Fn) ≡ nV arFn (Tn). C0 . κ3,n(Fn) ≡ EFn [Tn − EFn (Tn)]3/σ3 n (Fn). D0 . Hn(x, Fn) ≡ PFn ( √ n(Tn − T(Fn)) ≤ x). E0 . Kn(x, Fn) ≡ PFn ( √ nkF ∗ n − Fnk∞ ≤ x). F 0 . Ln(x, Pn) ≡ P rPn ( √ nkP ∗ n − PnkF ≤ x) where F is a class of functions for which the central limit theorem holds uniformly over F (i.e. a Donsker class). Because we usually lack closed - form expressions for the ideal bootstrap estimators in A0 - F0 , evaluation of A0 - F0 is usually indirect. Since the empirical d.f. Fn is discrete (with all its mass at the data), we could, in principle enumerate all possible samples of size n from Fn (or Pn) with replacement. If n is large, this is a large number, however: n n . [Problem: show that the number of distinct bootstrap samples is 2n−1 n .] On the other hand, Monte-Carlo approximations to A0 − F 0 are easy: let (X∗ j1 , . . . , X∗ jn) j = 1, . . . , B
2. BOOTSTRAP METHODS 5 be B independent samples of size n drawn with replacement from Fn(or Pn);let Fn()≡n be the empirical d.f.of the j-th sample,and let Tn≡T(Fn),j=1,.,B Then approximations of A'-F are given by: A".bB三n{a∑月Tn-Tn} B”.no品g=n∑月1(Tn-T2 C".Kn.B(Tin -Ta)3/onB D”.H.B(x)=a∑Bl{V元(Tn-Tn)≤x以. E”.K.B(x)≡a∑Bl{VF防n-Fne≤x以 F”.克B()=言∑B1l{VPm-PF≤x以. For fixed sample size n and data Fn,it follows from the Glivenko-Cantelli theorem (applied to the bootstrap sampling)that sup,B(x)-Hn(c,Fn)l→as.0asB→oo, and,by Donsker's theorem, VB(HtB(x)-Hn(x,Fn)》→U*(Hn(x,Fn)asB→o. Moreover,by the Dvoretzky,Kiefer,Wolfowitz (1956)inequality P(Un >A)0 where the constant 2 before the exponential comes via Massart (1990)), P(sup.B(c)-Hn(x,Fn)川≥e)≤2exp(-2Be2). For a given e>0 we can make this probability as small as we please by choosing B (over which we have complete control given sufficient computing power)sufficiently large.Since the deviations of H"B from Hn(,Fn)are so well -understood and controlled,much of our discussion below will focus on the differences between Hn(x,Fn)and Hn(,F). Sometimes it is possible to compute the distribution of the bootstrap estimator explicitly with out resort to Monte-Carlo;here is an example of this kind. Example 2.1 (The distribution of the bootstrap estimator of the median).Suppose that T(F)= F-1(1/2).Then T(Fn)=Fn1(1/2)=Xm+l/2 and T()=F路-1(1/2)=Xm+1/2
2. BOOTSTRAP METHODS 5 be B independent samples of size n drawn with replacement from Fn (or Pn); let F ∗ j,n(x) ≡ n −1Xn i=1 1[X∗ j,i≤x] be the empirical d.f. of the j−th sample, and let T ∗ j,n ≡ T(F ∗ j,n), j = 1, . . . , B. Then approximations of A0 − F 0 are given by: A00 . b ∗ n,B ≡ n n 1 B PB j=1 T ∗ j,n − Tn o . B00 . nσ∗2 n,B ≡ n 1 B PB j=1(T ∗ j,n − T∗ n ) 2 . C00 . κ ∗ 3,n,B ≡ 1 B PB j=1(T ∗ j,n − T∗ n ) 3/σ∗3 n,B. D00 . H∗ n,B(x) ≡ 1 B PB j=1 1{ √ n(T ∗ j,n − Tn) ≤ x}. E00 . K∗ n,B(x) ≡ 1 B PB j=1 1{ √ nkF ∗ j,n − Fnk∞ ≤ x}. F 00 . L ∗ n,B(x) ≡ 1 B PB j=1 1{ √ nkP ∗ j,n − PnkF ≤ x}. For fixed sample size n and data Fn, it follows from the Glivenko - Cantelli theorem (applied to the bootstrap sampling) that sup x |H∗ n,B(x) − Hn(x, Fn)| →a.s. 0 as B → ∞, and, by Donsker’s theorem, √ B(H∗ n,B(x) − Hn(x, Fn)) ⇒ U ∗∗(Hn(x, Fn)) as B → ∞. Moreover, by the Dvoretzky, Kiefer, Wolfowitz (1956) inequality ( P(kUnk ≥ λ) ≤ 2 exp(−2λ 2 ) for all n and λ > 0 where the constant 2 before the exponential comes via Massart (1990)), P(sup x |H∗ n,B(x) − Hn(x, Fn)| ≥ ) ≤ 2 exp(−2B2 ). For a given > 0 we can make this probability as small as we please by choosing B (over which we have complete control given sufficient computing power) sufficiently large. Since the deviations of H∗ n,B from Hn(x, Fn) are so well -understood and controlled, much of our discussion below will focus on the differences between Hn(x, Fn) and Hn(x, F). Sometimes it is possible to compute the distribution of the bootstrap estimator explicitly with out resort to Monte-Carlo; here is an example of this kind. Example 2.1 (The distribution of the bootstrap estimator of the median). Suppose that T(F) = F −1 (1/2). Then T(Fn) = F −1 n (1/2) = X([n+1]/2) and T(F ∗ n ) = F ∗−1 n (1/2) = X∗ ([n+1]/2)
6CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Let m [n+1]/2,and let Mj=#=Xj(w):i=1,...,n},j=1,...,n so that M=(M,...,Mn)~Multn(n,(1/n,...,1/n)). Now[Xim)>X((a]=[nF(X(k)(u)≤m-,and hence P(T(F克)=Xtm>X((o)lFn)=P(n(X((a》≤m-1Fn) P(Binomial(n,k/n)x)=P(X(m)>2)=P(nFn()and is given by 2=E(XY-p,X-1,y2-1)82: here Xs =(X-ux)/ax and Ys =(Y-uy)/oy are the standardized variables.If F is bivariate normal,then V2 =(1-p2)2. Consider estimation of the standard deviation of pn: on(F)=Varr(pn))112. The normal theory estimator of on(F)is (1-2)/vn-3. The delta-method estimate of on(F)is =(Var(z -(p/2)+
6CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Let m = [n + 1]/2, and let Mj ≡ #{X∗ i = Xj (ω) : i = 1, . . . , n}, j = 1, . . . , n so that M ≡ (M1, . . . , Mn) ∼ Multn(n,(1/n, . . . , 1/n)). Now [X∗ (m) > X(k) (ω)] = [nF ∗ n (X(k) (ω)) ≤ m − 1], and hence P(T(F ∗ n ) = X∗ (m) > X(k) (ω)|Fn) = P(nF ∗ n (X(k) (ω)) ≤ m − 1|Fn) = P(Binomial(n, k/n) ≤ m − 1) = mX−1 j=0 n j (k/n) j (1 − k/n) n−j , while P(Tn > x) = P(X(m) > x) = P(nFn(x) < m) = mX−1 j=0 n j F(x) j (1 − F(x))n−j . This implies that P(T(F ∗ n ) = X(k) (ω)|Fn) = mX−1 j=0 ( n j k − 1 n j 1 − k − 1 n n−j − n j k n j 1 − k n n−j ) for k = 1, . . . , n. Example 2.2 (Standard deviation of a correlation coefficient estimator). Let T(F) = ρ(F) where F is the bivariate distribution of a pair of random variables (X, Y ) with finite fourth moments. We know from chapter 2 that the sample correlation coefficient ˆρn ≡ T(Fn) satisfies √ n(ˆρn − ρ) ≡ √ n(ρ(Fn) − ρ(F)) →d N(0, V 2 ) where V 2 = V ar[Z1 − (ρ/2)[Z2 + Z3]] where Z ≡ (Z1, Z2, Z3) ∼ N3(0, Σ) and Σ is given by Σ = E(XsYs − ρ, X2 s − 1, Y 2 s − 1)⊗2 ; here Xs ≡ (X − µX)/σX and Ys ≡ (Y − µY )/σY are the standardized variables. If F is bivariate normal, then V 2 = (1 − ρ 2 ) 2 . Consider estimation of the standard deviation of ˆρn: σn(F) ≡ {V arF (ˆρn)} 1/2 . The normal theory estimator of σn(F) is (1 − ρˆ 2 n )/ √ n − 3. The delta-method estimate of σn(F) is Vˆ n √ n = {V ar d[Z1 − (ρ/2)[Z2 + Z3]]} 1/2 / √ n
2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to)the bootstrap estimate of on(F)is B B-1m-p2. 1=1 Finally the jackknife estimate of on(F)is n- n -2: 11 see the beginning of section 2 for the notation used here.We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping(sampling from the empirical measure Pn)be- comes clear,it seems natural to consider sampling from other estimators of the unknown P.For example,if we are quite confident that some parametric model holds,then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model.Here is a formal description of this type of model-based bootstrap procedure. Let (A)be a measurable space,and let P={P:0e}be a model,parametric,semi- parametric or nonparametric.We do not insist that e be finite-dimensional.For example, in a parametric extreme case p could be the family of all normal (Gaussian)distributions on (,A)=(R4,Bd).Or,to give a nonparametric example with only a smoothness restriction,P could be the family of all distributions on(,A)=(Ra,Bd)with a density with respect to Lebesgue measure which is uniformly continuous. Let X1,...,Xn,...be i.i.d.with distribution PE P.We assume that there exists an estimator =(X1,...,Xn)of.Then Efron's parametric (or model-based)bootstrap proceeds by sam- pling from the estimated or fitted model P=P:suppose that ,..are independent and identically distributed with distribution P on (,A),and let (1) =the parametric bootstrap empirical measure. i=1 The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model-based estimator P=p of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1,...,Xn are i.i.d.Po=N(u,o2)where =(u,o2).Let on= (n,)=(n:2)where 2 is the usual unbiased estimator of o2,and hence n(an-四~tn-, On -)品心xX- 2 Now P=N(),and ifiare i.i.d.P then the bootstrap estimators=(2) satisfy,conditionally on Fn, Vn(inin)~tn-1, 壳 u-1)2~X2-r 6 Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case
2. BOOTSTRAP METHODS 7 The (Monte-Carlo approximation to) the bootstrap estimate of σn(F) is vuutB−1X B j=1 [ρb ∗ j − ρ ∗] 2. Finally the jackknife estimate of σn(F) is vuut n − 1 n Xn j=1 [ρb(i) − ρb(·) ] 2; see the beginning of section 2 for the notation used here. We will discuss the jackknife further in sections 2 and 4. Parametric Bootstrap Methods Once the idea of nonparametric bootstrapping (sampling from the empirical measure Pn) becomes clear, it seems natural to consider sampling from other estimators of the unknown P. For example, if we are quite confident that some parametric model holds, then it seems that we should consider bootstrapping by sampling from an estimator of P based on the parametric model. Here is a formal description of this type of model - based bootstrap procedure. Let (X , A) be a measurable space, and let P = {Pθ : θ ∈ Θ} be a model, parametric, semiparametric or nonparametric. We do not insist that Θ be finite - dimensional. For example, in a parametric extreme case P could be the family of all normal (Gaussian) distributions on (X , A) = (R d , B d ). Or, to give a nonparametric example with only a smoothness restriction, P could be the family of all distributions on (X , A) = (R d , B d ) with a density with respect to Lebesgue measure which is uniformly continuous. Let X1, . . . , Xn, . . . be i.i.d. with distribution Pθ ∈ P. We assume that there exists an estimator ˆθn = ˆθn(X1, . . . , Xn) of θ. Then Efron’s parametric (or model - based) bootstrap proceeds by sampling from the estimated or fitted model Pθˆ(ω) ≡ Pˆω n : suppose that X∗ n,1 , . . . , X∗ n,n are independent and identically distributed with distribution Pˆω n on (X , A), and let P ∗ n ≡ n −1Xn i=1 δX∗ n,i (1) ≡ the parametric bootstrap empirical measure . The key difference between this parametric bootstrap procedure and the nonparametric bootstrap discussed earlier in this section is that we are now sampling from the model - based estimator Pˆ n = pθˆn of P rather than from the nonparametric estimator Pn. Example 2.3 Suppose that X1, . . . , Xn are i.i.d. Pθ = N(µ, σ2 ) where θ = (µ, σ2 ). Let ˆθn = (ˆµn, σˆ 2 n ) = (Xn, S2 n ) where S 2 n is the usual unbiased estimator of σ 2 , and hence √ n(ˆµn − µ) σˆn ∼ tn−1, (n − 1)ˆσ 2 n σ 2 ∼ χ 2 n−1 . Now Pθˆn = N(ˆµn, σˆ 2 n ), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then the bootstrap estimators ˆθ ∗ n = (ˆµ ∗ n , σˆ ∗2 n ) satisfy, conditionally on Fn, √ n(ˆµ ∗ n − µˆn) σˆ ∗ n ∼ tn−1, (n − 1)ˆσ ∗2 n σˆ 2 n ∼ χ 2 n−1 . Thus the bootstrap estimators have exactly the same distributions as the original estimators in this case
8CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1,...,Xn are i.i.d.Po=exponential(1/0):Po(X1 >t)=exp(-t/0) for t 0.Then n =Xn and non/0 Gamma(n,1).Now Pi =exponential(1/n),and if Xi,...,are i.i.d.P,then n=n has (non/nn)Gamma(n,1),so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a "smoothed empirical measure";or the "smoothed boot- strap”).Suppose that P={P on (Ra,Bd):p= dp d入 exists and is uniformly continuous. Then one way to estimate P so that our estimator PnE P is via a kernel estimator of the density p: in(d)=i ∫() dPn(y) where k:Rd->R is a uniformly continuous density function.Then Pn is defined for CA by n(C)= pn(x)dx, and the model-based bootstrap proceeds by sampling from Pn There are many other examples of this type involving nonparametric or semiparametric models P.For some work on "smoothed bootstrap"methods see e.g.Silverman and Young (1987)and Hall,DiCiccio,and Romano (1989). Exchangeably-weighted and "Bayesian"bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X;equal the observations Xi(w)in the underlying sample.Thinking about the process of sampling at random (with replacement)from the population described by the empirical measure Pn,it becomes clear that we can think of the bootstrap empirical measure P as the empirical measure with multinomial random weights: P= 1∑x:= M:6x:(@) i= This view of Efron's nonparametric bootstrap as the empirical measure with random weights sug- gests that we could obtain other random measures which would behave much the same way as Efron's nonparametric bootstrap,but without the same random sampling interpretation,by re- placing the vector of multinomial weights by some other random vector W.One of the possible deficiencies of the nonparametric bootstrap involves its "discreteness"via missing observations in the original sample:note that the number of points of the original sample which are missed (or not given any bootstrap weight)is Nn=#jn:M=0)=>11{M=0).hence the proportion of observations missed by the bootstrap is n-1Nn,and the expected number proportion of missed observations is E(n-1Nn)=P(M=0)=(1-1/n)”→e-1=.36787.…
8CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS Example 2.4 Suppose that X1, . . . , Xn are i.i.d. Pθ = exponential(1/θ): Pθ(X1 > t) = exp(−t/θ) for t ≥ 0. Then ˆθn = Xn and nˆθn/θ ∼ Gamma(n, 1). Now Pθˆn = exponential(1/ ˆθn), and if X∗ 1 , . . . , X∗ n are i.i.d. Pθˆn , then ˆθ ∗ n = X ∗ n has (nˆθ ∗ n/ ˆθn|Fn) ∼ Gamma(n, 1), so the bootstrap distribution replicates the original estimator exactly. Example 2.5 (Bootstrapping from a “smoothed empirical measure”; or the “smoothed bootstrap”). Suppose that P = {P on (R d , B d ) : p = dP dλ exists and is uniformly continuous}. Then one way to estimate P so that our estimator Pˆ n ∈ P is via a kernel estimator of the density p: pˆn(x) = 1 b d n Z k y − x bn dPn(y) where k : R d → R is a uniformly continuous density function. Then Pˆ n is defined for C ∈ A by Pˆ n(C) = Z C pˆn(x)dx, and the model- based bootstrap proceeds by sampling from Pˆ n. There are many other examples of this type involving nonparametric or semiparametric models P. For some work on “smoothed bootstrap” methods see e.g. Silverman and Young (1987) and Hall, DiCiccio, and Romano (1989). Exchangeably - weighted and “Bayesian” bootstrap methods In the course of example 5.1 we introduced the vector M of counts of how many times the bootstrap variables X∗ i equal the observations Xj (ω) in the underlying sample. Thinking about the process of sampling at random (with replacement) from the population described by the empirical measure Pn, it becomes clear that we can think of the bootstrap empirical measure P ∗ n as the empirical measure with multinomial random weights: P ∗ n = 1 n Xn i=1 δX∗ i = 1 n Xn i=1 MiδXi(ω) . This view of Efron’s nonparametric bootstrap as the empirical measure with random weights suggests that we could obtain other random measures which would behave much the same way as Efron’s nonparametric bootstrap, but without the same random sampling interpretation, by replacing the vector of multinomial weights by some other random vector W. One of the possible deficiencies of the nonparametric bootstrap involves its “discreteness” via missing observations in the original sample: note that the number of points of the original sample which are missed (or not given any bootstrap weight) is Nn ≡ #{j ≤ n : Mj = 0} = Pn j=1 1{Mj = 0}. hence the proportion of observations missed by the bootstrap is n −1Nn, and the expected number proportion of missed observations is E(n −1Nn) = P(M1 = 0) = (1 − 1/n) n → e −1 ˙=.36787 . . .
2. BOOTSTRAP METHODS 9 Moreover,from occupancy theory for urn models vn(n-1Nn-(1-1/m))aN(0,e-1(1-2e-1)=N(0,.09720887.…)月 see e.g.Johnson and Kotz(1977),page 317,3.with r =0.]By using some other vector of exchangeable weights W rather than Mn~Multn(n,(1/n,...,1/n)),we might be able to avoid some of this discreteness caused by multinomial weights. Since the resulting measure should be a probability measure,it seems reasonable to require that the components of W should sum to n.Since the multinomial random vector with cell probabilities all equal to 1/n is exchangeable,it seems reasonable to require that the vector W have an exchangeable distribution:i.e.W=(W(1),...,W(n))4W for all permutations of {1,..,n}.Then PW Wni6X:(w) i=1 is called the exchangeably weighted bootstrap empirical measure corresponding to the weight vector W.Here are several examples. Example 2.6 (Dirichlet weights).Suppose that Yi,Y2,...are i.i.d.exponential(1)random vari- ables,and set nYi Wni三 yi+…+Yn i=1,.,n. The resulting random vector W/n has a Dirichlet(1,...,1)distribution;i.e.n-WD where the Di's are the spacings of a random sample of n-1 Uniform(0,1)random variables Example 2.7 (More general continuous weights).Other weights W of the same for as in example 1.6 are obtained by replacing the exponential distribution of the Y's by some other distribution on R+.It will turn out that the limit theory can be established for any of these weights as long as the Yi's satisfy YiL2.1;i.e.P(Y>t)dt 0. Other weights W based on various urn schemes are also possible;see Praestgaard and Wellner (1993)for some of these
2. BOOTSTRAP METHODS 9 [Moreover, from occupancy theory for urn models √ n(n −1Nn − (1 − 1/n) n ) →d N(0, e−1 (1 − 2e −1 )) = N(0, .09720887 . . .); see e.g. Johnson and Kotz (1977), page 317, 3. with r = 0.] By using some other vector of exchangeable weights W rather than Mn ∼ Multn(n,(1/n, . . . , 1/n)), we might be able to avoid some of this discreteness caused by multinomial weights. Since the resulting measure should be a probability measure, it seems reasonable to require that the components of W should sum to n. Since the multinomial random vector with cell probabilities all equal to 1/n is exchangeable, it seems reasonable to require that the vector W have an exchangeable distribution: i.e. πW ≡ (Wπ(1), . . . , Wπ(n) ) d= W for all permutations π of {1, . . . , n}. Then P W n ≡ 1 n Xn i=1 WniδXi(ω) is called the exchangeably weighted bootstrap empirical measure corresponding to the weight vector W. Here are several examples. Example 2.6 (Dirichlet weights). Suppose that Y1, Y2, . . . are i.i.d. exponential(1) random variables, and set Wni ≡ nYi Y1 + · · · + Yn , i = 1, . . . , n. The resulting random vector W/n has a Dirichlet(1, . . . , 1) distribution; i.e. n −1W d= D where the Di ’s are the spacings of a random sample of n − 1 Uniform(0, 1) random variables. Example 2.7 (More general continuous weights). Other weights W of the same for as in example 1.6 are obtained by replacing the exponential distribution of the Y ’s by some other distribution on R +. It will turn out that the limit theory can be established for any of these weights as long as the Yi ’s satisfy Yi ∈ L2,1; i.e. R ∞ 0 p P(|Y | > t)dt 0. Other weights W based on various urn schemes are also possible; see Praestgaard and Wellner (1993) for some of these
10CHAPTER 8.BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 3 The Jackknife The jackknife preceded the bootstrap,mostly due to its simplicity and relative ease of computation. The original work on the "delete -one"jackknife is due to Quenouille (1949)and Tukey (1958). Here is how it works. Suppose that T(Fn)estimates T(F).Let Tn:i三T(Fn-l,i) where -1=n-∑-X方 thus Tn.i is the estimator based on the data with Xi deleted or left out.Let n We also set Tta≡nTn-(n-l)Tn,i≡ith pseudo value and Tn≡n-1∑=1Tt&=nTn-(m-1)Tn The Jackknife estimator of bias,and the jackknife estimator of T(F) Now let En=EFTn=ErT(Fn),and suppose that we can expand En in powers of n-1 as follows: En=ErTn=T(F)+)+. n2 Then the bias of the estimator Tn=T(Fn)is biasn(P)=Er(T)-TE)=1(+2E+ n n2 We can also write T(F)=EF(Tn)-biasn(F). Note that 江=-1=0+9++ Hence it follows that EF(Tn)=nEn -(n-1)En-1 n+ar}+asn{信-n}+ = =T(F)- a2(F) n(n-1) 十···。 Thus Tr has bias O(n-2)whereas Tn has bias of the order O(n-1)if a(F)0.We call T the jackknife estimator of T(F);similarly,by writing Tn=Tn -biasn, we find that biasn Tn -Tn (n-1){Tn,-Tn}
10CHAPTER 8. BOOTSTRAP AND JACKKNIFE ESTIMATION OF SAMPLING DISTRIBUTIONS 3 The Jackknife The jackknife preceded the bootstrap, mostly due to its simplicity and relative ease of computation. The original work on the “delete -one” jackknife is due to Quenouille (1949) and Tukey (1958). Here is how it works. Suppose that T(Fn) estimates T(F). Let Tn:i ≡ T(Fn−1,i) where Fn−1,i(x) ≡ 1 n − 1 X j6=i 1(−∞,x] (Xj ); thus Tn,i is the estimator based on the data with Xi deleted or left out. Let Tn,· ≡ 1 n Xn i=1 Tn,i. We also set T ∗ n,i ≡ nTn − (n − 1)Tn,i ≡ ith pseudo value and T ∗ n ≡ n −1 Pn i=1 T ∗ n,i = nTn − (n − 1)Tn,· . The Jackknife estimator of bias, and the jackknife estimator of T(F) Now let En ≡ EF Tn = EF T(Fn), and suppose that we can expand En in powers of n −1 as follows: En ≡ EF Tn = T(F) + a1(F) n + a2(F) n2 + · · · . Then the bias of the estimator Tn = T(Fn) is biasn(F) ≡ EF (Tn) − T(F) = a1(F) n + a2(F) n2 + · · · . We can also write T(F) = EF (Tn) − biasn(F). Note that EF Tn,· = En−1 = T(F) + a1(F) n − 1 + a2(F) (n − 1)2 + · · · . Hence it follows that EF (T ∗ n ) = nEn − (n − 1)En−1 = T(F) + a2(F) 1 n − 1 n − 1 + a3(F) 1 n2 − 1 (n − 1)2 + · · · = T(F) − a2(F) n(n − 1) + · · · . Thus T ∗ n has bias O(n −2 ) whereas Tn has bias of the order O(n −1 ) if a1(F) 6= 0. We call T ∗ n the jackknife estimator of T(F); similarly, by writing T ∗ n = Tn − bias dn, we find that bias dn = Tn − T ∗ n = (n − 1){Tn,· − Tn}