正在加载图片...
BOOTSTRAP CONFIDENCE INTERVALS 197 DiCiccio and Efron (1992,Section 3)giving from Notice that ca is still required here,to estimate 2o (4.7).This assumes that t(u)is differentiable.In in(4.12). fact we need t(u)to be twice differentiable in order Formula(4.14)is the one used in Tables 2 and 3. to carry out the ABC computations. It has the advantage of being transformation invari- The ABC algorithm begins by computing a from ant,(2.11),and is sometimes more accurate than (4.7)-(4.8).Then the parameters (a,zo,ca)are esti- (4.13).However,(4.13)is local,all of the recompu- mated by computing p+2 numerical second deriva- tations of t(u)involved in (4.8)-(4.13)taking place tives.The first of these is infinitesimally near=y.In this sense ABCq is (4.9) a=alfmui+st0la like the standard method.Nonlocality occasionally /663 causes computational difficulties with boundary vi- when is the MLE of the natural parameter vec- olations.In fact (4.13)is a simple quadratic approx- tor n.This turns out to be the same as the skew- imation to (4.14),so ABC and ABCq usually agree ness definition of a,(3.10),in the one-parameter reasonably well. The main point of this article is that highly ac- family obtained from Stein's least favorable family construction [see Efron,1987,(6.7)].Formula (4.9) curate approximate confidence intervals can now be uses exponential family relationships to compute calculated on a routine basis.The ABC intervals are implemented by a short computer algorithm.[The the skewness from a second derivative. ABC intervals in Tables 2 and 3 were produced by The second ABC numerical derivative is the parametric and nonparametric ABC algorithms (4.10) 26 “abcpar'”and“abcnon."These and the BCa program are available in the language S:send electronic mail co measures how nonlinear the parameter of inter- to statlib@lib.stat.cmu.edu with the one-line mes- est 6 is,as a function of u. sage:send bootstrap.funs from S.]There are five in- The final p second derivatives are required for puts to the algorithm:立,,方and the functions t() the bias-correction parameter zo.The parametric and mu().The outputs include @sTAN[a],ABc[a] delta-method estimate of bias for =t()can be and ABc[].Computational effort for the ABC in- expressed as tervals is two or three times that required for the standard intervals. (4.11) The ABC intervals can be useful even in very simple situations.Suppose that the data consists where di is the ith eigenvalue and yi is the ith of a single observation x from a Poisson distribu- eigenvector of Then tion with unknown expectation 0.In this case 6= (4.12)0=Φ-1(2.(a)(cg-b/a)兰a+c,-b/a t(x)=x and =v6.Carrying through definitions (4.9)-4.14)gives d=0=1/(60/2),c,=0,and so This involves terms other than b becuase zo relates to median bias.For the kind of smooth exponential family problems considered here,(4.12)is usually cl网=i+a-,w=+zo more accurate than the direct estimate(2.8). The simplest form of the ABC intervals,called For x 7,the interval (ABc[0.05],0ABc[0.95]) ABCquadratic or ABCq,gives the a-level end- equals (3.54,12.67).This compares with the exact point directly as a function of the five numbers interval (3.57,12.58)for 0,splitting the atom of (0,,a,2o,g): probability at x=7,and with the standard interval (2.65,11.35). a→w=0+z@间 Here is a more realistic example of the ABC al- gorithm,used in a logistic regression context. (4.13) →=(1-a0P →5=入+c,A2 Table 4 shows the data from an experiment con- cerning mammalian cell growth.The goal of this →9ABca[a]=0+5 experiment was to quantify the effects of two fac- The original ABC endpoint,denoted BABcla],re- tors on the success of a culture.Factor"r"measures quires one more recomputation of the function t(): the ratio of two key constituents of the culture plate,while factor "d"measures how many days a→0=0+z(@→入= (1-aw)2 were allowed for culture maturation.A total of (4.14) 1,843 independent cultures were prepared,investi- gating 25 different(ri,d)combinations.The table lists si;and nii for each combination,the num-BOOTSTRAP CONFIDENCE INTERVALS 197 DiCiccio and Efron (1992, Section 3) giving σˆ from (4.7). This assumes that tµ‘ is differentiable. In fact we need tµ‘ to be twice differentiable in order to carry out the ABC computations. The ABC algorithm begins by computing σˆ from (4.7)–(4.8). Then the parameters a; z0 ; cq ‘ are esti￾mated by computing p+2 numerical second deriva￾tives. The first of these is 4:9‘ aˆ = ∂ 2 ∂ε2 ’t˙ 0muηˆ + εt˙‘ε=0  6σˆ 3 ; when ηˆ is the MLE of the natural parameter vec￾tor η. This turns out to be the same as the skew￾ness definition of aˆ, (3.10), in the one-parameter family obtained from Stein’s least favorable family construction [see Efron, 1987, (6.7)]. Formula (4.9) uses exponential family relationships to compute the skewness from a second derivative. The second ABC numerical derivative is 4:10‘ cˆq = ∂ 2 ∂ε2 t  µˆ + ε6ˆ t˙ σˆ  ε=0  2σˆ y cˆq measures how nonlinear the parameter of inter￾est θ is, as a function of µ. The final p second derivatives are required for the bias-correction parameter z0 . The parametric delta-method estimate of bias for θˆ = tµˆ ‘ can be expressed as 4:11‘ bˆ = 1 2 X p i=1 ∂ 2 ∂ε2 tµˆ + εd 1/2 i γi ‘ ε=0 ; where di is the ith eigenvalue and γi is the ith eigenvector of 6ˆ . Then 4:12‘ zˆ0 = 8 −1 ￾ 2·8aˆ‘·8cˆq−bˆ/σˆ ‘  := aˆ+cˆq−bˆ/σˆ : This involves terms other than bˆ becuase z0 relates to median bias. For the kind of smooth exponential family problems considered here, (4.12) is usually more accurate than the direct estimate (2.8). The simplest form of the ABC intervals, called ABCquadratic or ABCq, gives the α-level end￾point directly as a function of the five numbers θˆ; σˆ; aˆ; zˆ0 ; cˆq ‘: 4:13‘ α → w ≡ zˆ0 + z α‘ → λ ≡ w 1 − aˆw‘ 2 → ξ ≡ λ + cˆqλ 2 → θˆ ABCq’α = θˆ + σˆ ξ: The original ABC endpoint, denoted θˆ ABC’α, re￾quires one more recomputation of the function t·‘: 4:14‘ α → w = zˆ0 + z α‘ → λ = w 1 − aˆw‘ 2 → θˆ ABC’α = t  µˆ + λ6ˆ t˙ σˆ  : Notice that cˆq is still required here, to estimate zˆ0 in (4.12). Formula (4.14) is the one used in Tables 2 and 3. It has the advantage of being transformation invari￾ant, (2.11), and is sometimes more accurate than (4.13). However, (4.13) is local, all of the recompu￾tations of tµ‘ involved in (4.8)–(4.13) taking place infinitesimally near µˆ = y. In this sense ABCq is like the standard method. Nonlocality occasionally causes computational difficulties with boundary vi￾olations. In fact (4.13) is a simple quadratic approx￾imation to (4.14), so ABC and ABCq usually agree reasonably well. The main point of this article is that highly ac￾curate approximate confidence intervals can now be calculated on a routine basis. The ABC intervals are implemented by a short computer algorithm. [The ABC intervals in Tables 2 and 3 were produced by the parametric and nonparametric ABC algorithms “abcpar” and “abcnon.” These and the BCa program are available in the language S: send electronic mail to statlib@lib.stat.cmu.edu with the one-line mes￾sage: send bootstrap.funs from S.] There are five in￾puts to the algorithm: µˆ, 6ˆ , ηˆ and the functions t·‘ and mu·‘. The outputs include θˆ STAN’α, θˆ ABC’α and θˆ ABCq’α. Computational effort for the ABC in￾tervals is two or three times that required for the standard intervals. The ABC intervals can be useful even in very simple situations. Suppose that the data consists of a single observation x from a Poisson distribu￾tion with unknown expectation θ. In this case θˆ = tx‘ = x and σˆ = p θˆ. Carrying through definitions (4.9)–(4.14) gives aˆ = zˆ0 = 1/6θˆ1/2 ‘; cˆq = 0, and so θˆ ABC’α = θˆ + w 1 − aˆw‘ 2 p θˆ; w = zˆ0 + z α‘ : For x = 7, the interval θˆ ABC’0:05; θˆ ABC’0:95‘ equals 3:54; 12:67‘. This compares with the exact interval (3.57, 12.58) for θ, splitting the atom of probability at x = 7, and with the standard interval 2:65; 11:35‘. Here is a more realistic example of the ABC al￾gorithm, used in a logistic regression context. Table 4 shows the data from an experiment con￾cerning mammalian cell growth. The goal of this experiment was to quantify the effects of two fac￾tors on the success of a culture. Factor “r” measures the ratio of two key constituents of the culture plate, while factor “d” measures how many days were allowed for culture maturation. A total of 1,843 independent cultures were prepared, investi￾gating 25 different ri ;dj ‘ combinations. The table lists sij and nij for each combination, the num-
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有