198 T.J.DICICCIO AND B.EFRON TABLE 4 Cell data:1,843 cell cultures were prepared,varying two factors,r(the ratio of two key constituents)and d (the number of days of culturing).Data shown are sij and nij.the number of successful cultures and the number of cultures attempted,at the ith level of r and the jth level of d di d2 ds da ds Total ri 5/31 3/28 20/45 24/47 29/35 81/186 T2 15/77 36/78 43/71 56/71 66/74 216/371 Ts 48/126 68/116 145/171 98/119 114/129 473/661 TA 29/92 35/52 57/85 38/50 7277 231/356 T5 11/53 20/52 20/48 40/55 52/61 143/269 Total 108/379 162/326 285/420 256/342 333/376 1144/1843 ber of successful cultures,compared to the number not too bad in this case,although better perfor- attempted. mance might have been expected with n=1,843 We suppose that the number of successful cul- data points.In fact it is very difficult to guess a pri- tures is a binomial variate, ori what constitutes a large enough sample size for sij~ii.d.binomial(nij,mi), adequate standard-interval performance. (4.15) The ABC formulas (4.13)-(4.14)were derived as i,j=1,2,3,4,5, second-order approximations to the BCa endpoints by DiCiccio and Efron (1992).They showed that with an additive logistic regression model for the these formulas give second-order accuracy as in unknown probabilities Tij, (2.10),and also second-order correctness.Section 8 reviews some of these results.There are many other expressions for ABC-like interval endpoints T (4.16) that enjoy equivalent second-order properties in theory,although they may be less dependable in ∑a-∑B=0. practice.A particularly simple formula is 1 For the example here we take the parameter of in- (4.20)0ABc[a]=0sTAN[a]+o+(2a+g)z(). terest to be This shows that the ABC endpoints are not just a (4.17) 6=15 T51 translation of @srAN[a]. In repeated sampling situations the estimated the success probability for the lowest r and highest d divided by the success probability for the highest constants (a,20,ca)are of stochastic order 1/n in the sample size,the same as a.They multiply a r and lowest d.This typifies the kind of problem traditionally handled by the standard method. in (4.20),resulting in corrections of order /vn to A logistic regression program calculated maxi- @sTAN[a].If there were only 1/4 as much cell data, mum likelihood estimates @:Bi,from which we n=461,but with the same proportion of successes in every cell of Table 4,then (a,2,would be obtained twice as large.This would double the relative dif- (4.18) i=1+exp[-(位+a+月1l =4.16. ference (0ABc[a]-0sTAN[a])/G according to (4.20), 1+exp[-(i+a1+B5)] rendering 0sTAN[a]quite inaccurate. The output of the logistic regression program pro- Both a and 2o are transformation invariant,re- vided and n for the ABC algorithm.Section 3 taining the same numerical value under monotone of DiCiccio and Efron(1992)gives the exact speci- parameter transformations =m().The nonlin- fication for an ABC analysis of a logistic regression earity constant co is not invariant,and it can be problem.Applied here,the algorithm gave standard reduced by transformations that make o more lin- and ABC 0.90 central intervals for 0, ear as a function of u.Changing parameters from 0=T15/πs1to中=log()changes(a,o,cg)from (6sTAw[0.05],sTAx[0.95])=(3.06,5.26), (-0.006,-0.025,0.105)to(-0.006,-0.025,0.025) 4.19) (ABcl0.05],ABC[0.95])=(3.20,5.43). for the cell data.The standard intervals are nearly correct on the d scale.The ABC and BC methods The ABC limits are shifted moderately upwards automate this kind of data-analytic trick. relative to the standard limits,enough to make the We can visualize the relationship between the shape (1.6)equal 1.32.The standard intervals are BC and ABC intervals in terms of Figure 3.The198 T. J. DICICCIO AND B. EFRON Table 4 Cell data: 1,843 cell cultures were prepared, varying two factors, r (the ratio of two key constituents) and d (the number of days of culturing). Data shown are sij and nij ; the number of successful cultures and the number of cultures attempted, at the ith level of r and the jth level of d d1 d2 d3 d4 d5 Total r1 5/31 3/28 20/45 24/47 29/35 81/186 r2 15/77 36/78 43/71 56/71 66/74 216/371 r3 48/126 68/116 145/171 98/119 114/129 473/661 r4 29/92 35/52 57/85 38/50 72/77 231/356 r5 11/53 20/52 20/48 40/55 52/61 143/269 Total 108/379 162/326 285/420 256/342 333/376 1144/1843 ber of successful cultures, compared to the number attempted. We suppose that the number of successful cultures is a binomial variate, (4.15) sij ∼i:i:d: binomialnij ;πij ; i; j = 1; 2; 3; 4; 5; with an additive logistic regression model for the unknown probabilities πij , 4:16 log πij 1 − πij = µ + αi + βj ; X 5 1 αi = X 5 1 βj = 0: For the example here we take the parameter of interest to be 4:17 θ = π15 π51 ; the success probability for the lowest r and highest d divided by the success probability for the highest r and lowest d. This typifies the kind of problem traditionally handled by the standard method. A logistic regression program calculated maximum likelihood estimates µˆ; αˆ i ;βˆ j , from which we obtained 4:18 θˆ = 1 + exp−µˆ + αˆ5 + βˆ 1 1 + exp−µˆ + αˆ1 + βˆ 5 = 4:16: The output of the logistic regression program provided µˆ, 6ˆ and ηˆ for the ABC algorithm. Section 3 of DiCiccio and Efron (1992) gives the exact speci- fication for an ABC analysis of a logistic regression problem. Applied here, the algorithm gave standard and ABC 0.90 central intervals for θ, 4:19 θˆ STAN0:05; θˆ STAN0:95 = 3:06; 5:26; θˆ ABC0:05; θˆ ABC0:95 = 3:20; 5:43: The ABC limits are shifted moderately upwards relative to the standard limits, enough to make the shape (1.6) equal 1.32. The standard intervals are not too bad in this case, although better performance might have been expected with n = 1; 843 data points. In fact it is very difficult to guess a priori what constitutes a large enough sample size for adequate standard-interval performance. The ABC formulas (4.13)–(4.14) were derived as second-order approximations to the BCa endpoints by DiCiccio and Efron (1992). They showed that these formulas give second-order accuracy as in (2.10), and also second-order correctness. Section 8 reviews some of these results. There are many other expressions for ABC-like interval endpoints that enjoy equivalent second-order properties in theory, although they may be less dependable in practice. A particularly simple formula is 4:20 θˆ ABCα := θˆ STANα + σˆ zˆ0 + 2aˆ + cˆq z α 2 : This shows that the ABC endpoints are not just a translation of θˆ STANα. In repeated sampling situations the estimated constants aˆ; zˆ0 ; cˆq are of stochastic order 1/ √ n in the sample size, the same as σˆ . They multiply σˆ in (4.20), resulting in corrections of order σˆ / √ n to θˆ STANα. If there were only 1/4 as much cell data, n = 461, but with the same proportion of successes in every cell of Table 4, then aˆ; zˆ0 ; cˆq would be twice as large. This would double the relative difference θˆ ABCα − θˆ STANα/σˆ according to (4.20), rendering θˆ STANα quite inaccurate. Both aˆ and zˆ0 are transformation invariant, retaining the same numerical value under monotone parameter transformations φ = mθ. The nonlinearity constant cˆq is not invariant, and it can be reduced by transformations that make φ more linear as a function of µ. Changing parameters from θ = π15/π51 to φ = logθ changes aˆ; zˆ0 ; cˆq from −0:006; −0:025; 0:105 to −0:006; −0:025; 0:025 for the cell data. The standard intervals are nearly correct on the φ scale. The ABC and BCa methods automate this kind of data-analytic trick. We can visualize the relationship between the BCa and ABC intervals in terms of Figure 3. The