BOOTSTRAP CONFIDENCE INTERVALS 197 DiCiccio and Efron (1992,Section 3)giving from Notice that ca is still required here,to estimate 2o (4.7).This assumes that t(u)is differentiable.In in(4.12). fact we need t(u)to be twice differentiable in order Formula(4.14)is the one used in Tables 2 and 3. to carry out the ABC computations. It has the advantage of being transformation invari- The ABC algorithm begins by computing a from ant,(2.11),and is sometimes more accurate than (4.7)-(4.8).Then the parameters (a,zo,ca)are esti- (4.13).However,(4.13)is local,all of the recompu- mated by computing p+2 numerical second deriva- tations of t(u)involved in (4.8)-(4.13)taking place tives.The first of these is infinitesimally near=y.In this sense ABCq is (4.9) a=alfmui+st0la like the standard method.Nonlocality occasionally /663 causes computational difficulties with boundary vi- when is the MLE of the natural parameter vec- olations.In fact (4.13)is a simple quadratic approx- tor n.This turns out to be the same as the skew- imation to (4.14),so ABC and ABCq usually agree ness definition of a,(3.10),in the one-parameter reasonably well. The main point of this article is that highly ac- family obtained from Stein's least favorable family construction [see Efron,1987,(6.7)].Formula (4.9) curate approximate confidence intervals can now be uses exponential family relationships to compute calculated on a routine basis.The ABC intervals are implemented by a short computer algorithm.[The the skewness from a second derivative. ABC intervals in Tables 2 and 3 were produced by The second ABC numerical derivative is the parametric and nonparametric ABC algorithms (4.10) 26 “abcpar'”and“abcnon."These and the BCa program are available in the language S:send electronic mail co measures how nonlinear the parameter of inter- to statlib@lib.stat.cmu.edu with the one-line mes- est 6 is,as a function of u. sage:send bootstrap.funs from S.]There are five in- The final p second derivatives are required for puts to the algorithm:立,,方and the functions t() the bias-correction parameter zo.The parametric and mu().The outputs include @sTAN[a],ABc[a] delta-method estimate of bias for =t()can be and ABc[].Computational effort for the ABC in- expressed as tervals is two or three times that required for the standard intervals. (4.11) The ABC intervals can be useful even in very simple situations.Suppose that the data consists where di is the ith eigenvalue and yi is the ith of a single observation x from a Poisson distribu- eigenvector of Then tion with unknown expectation 0.In this case 6= (4.12)0=Φ-1(2.(a)(cg-b/a)兰a+c,-b/a t(x)=x and =v6.Carrying through definitions (4.9)-4.14)gives d=0=1/(60/2),c,=0,and so This involves terms other than b becuase zo relates to median bias.For the kind of smooth exponential family problems considered here,(4.12)is usually cl网=i+a-,w=+zo more accurate than the direct estimate(2.8). The simplest form of the ABC intervals,called For x 7,the interval (ABc[0.05],0ABc[0.95]) ABCquadratic or ABCq,gives the a-level end- equals (3.54,12.67).This compares with the exact point directly as a function of the five numbers interval (3.57,12.58)for 0,splitting the atom of (0,,a,2o,g): probability at x=7,and with the standard interval (2.65,11.35). a→w=0+z@间 Here is a more realistic example of the ABC al- gorithm,used in a logistic regression context. (4.13) →=(1-a0P →5=入+c,A2 Table 4 shows the data from an experiment con- cerning mammalian cell growth.The goal of this →9ABca[a]=0+5 experiment was to quantify the effects of two fac- The original ABC endpoint,denoted BABcla],re- tors on the success of a culture.Factor"r"measures quires one more recomputation of the function t(): the ratio of two key constituents of the culture plate,while factor "d"measures how many days a→0=0+z(@→入= (1-aw)2 were allowed for culture maturation.A total of (4.14) 1,843 independent cultures were prepared,investi- gating 25 different(ri,d)combinations.The table lists si;and nii for each combination,the num-BOOTSTRAP CONFIDENCE INTERVALS 197 DiCiccio and Efron (1992, Section 3) giving σˆ from (4.7). This assumes that tµ is differentiable. In fact we need tµ to be twice differentiable in order to carry out the ABC computations. The ABC algorithm begins by computing σˆ from (4.7)–(4.8). Then the parameters a; z0 ; cq are estimated by computing p+2 numerical second derivatives. The first of these is 4:9 aˆ = ∂ 2 ∂ε2 t˙ 0muηˆ + εt˙ε=0 6σˆ 3 ; when ηˆ is the MLE of the natural parameter vector η. This turns out to be the same as the skewness definition of aˆ, (3.10), in the one-parameter family obtained from Stein’s least favorable family construction [see Efron, 1987, (6.7)]. Formula (4.9) uses exponential family relationships to compute the skewness from a second derivative. The second ABC numerical derivative is 4:10 cˆq = ∂ 2 ∂ε2 t µˆ + ε6ˆ t˙ σˆ ε=0 2σˆ y cˆq measures how nonlinear the parameter of interest θ is, as a function of µ. The final p second derivatives are required for the bias-correction parameter z0 . The parametric delta-method estimate of bias for θˆ = tµˆ can be expressed as 4:11 bˆ = 1 2 X p i=1 ∂ 2 ∂ε2 tµˆ + εd 1/2 i γi ε=0 ; where di is the ith eigenvalue and γi is the ith eigenvector of 6ˆ . Then 4:12 zˆ0 = 8 −1 2·8aˆ·8cˆq−bˆ/σˆ := aˆ+cˆq−bˆ/σˆ : This involves terms other than bˆ becuase z0 relates to median bias. For the kind of smooth exponential family problems considered here, (4.12) is usually more accurate than the direct estimate (2.8). The simplest form of the ABC intervals, called ABCquadratic or ABCq, gives the α-level endpoint directly as a function of the five numbers θˆ; σˆ; aˆ; zˆ0 ; cˆq : 4:13 α → w ≡ zˆ0 + z α → λ ≡ w 1 − aˆw 2 → ξ ≡ λ + cˆqλ 2 → θˆ ABCqα = θˆ + σˆ ξ: The original ABC endpoint, denoted θˆ ABCα, requires one more recomputation of the function t·: 4:14 α → w = zˆ0 + z α → λ = w 1 − aˆw 2 → θˆ ABCα = t µˆ + λ6ˆ t˙ σˆ : Notice that cˆq is still required here, to estimate zˆ0 in (4.12). Formula (4.14) is the one used in Tables 2 and 3. It has the advantage of being transformation invariant, (2.11), and is sometimes more accurate than (4.13). However, (4.13) is local, all of the recomputations of tµ involved in (4.8)–(4.13) taking place infinitesimally near µˆ = y. In this sense ABCq is like the standard method. Nonlocality occasionally causes computational difficulties with boundary violations. In fact (4.13) is a simple quadratic approximation to (4.14), so ABC and ABCq usually agree reasonably well. The main point of this article is that highly accurate approximate confidence intervals can now be calculated on a routine basis. The ABC intervals are implemented by a short computer algorithm. [The ABC intervals in Tables 2 and 3 were produced by the parametric and nonparametric ABC algorithms “abcpar” and “abcnon.” These and the BCa program are available in the language S: send electronic mail to statlib@lib.stat.cmu.edu with the one-line message: send bootstrap.funs from S.] There are five inputs to the algorithm: µˆ, 6ˆ , ηˆ and the functions t· and mu·. The outputs include θˆ STANα, θˆ ABCα and θˆ ABCqα. Computational effort for the ABC intervals is two or three times that required for the standard intervals. The ABC intervals can be useful even in very simple situations. Suppose that the data consists of a single observation x from a Poisson distribution with unknown expectation θ. In this case θˆ = tx = x and σˆ = p θˆ. Carrying through definitions (4.9)–(4.14) gives aˆ = zˆ0 = 1/6θˆ1/2 ; cˆq = 0, and so θˆ ABCα = θˆ + w 1 − aˆw 2 p θˆ; w = zˆ0 + z α : For x = 7, the interval θˆ ABC0:05; θˆ ABC0:95 equals 3:54; 12:67. This compares with the exact interval (3.57, 12.58) for θ, splitting the atom of probability at x = 7, and with the standard interval 2:65; 11:35. Here is a more realistic example of the ABC algorithm, used in a logistic regression context. Table 4 shows the data from an experiment concerning mammalian cell growth. The goal of this experiment was to quantify the effects of two factors on the success of a culture. Factor “r” measures the ratio of two key constituents of the culture plate, while factor “d” measures how many days were allowed for culture maturation. A total of 1,843 independent cultures were prepared, investigating 25 different ri ;dj combinations. The table lists sij and nij for each combination, the num-