正在加载图片...
198 T.J.DICICCIO AND B.EFRON TABLE 4 Cell data:1,843 cell cultures were prepared,varying two factors,r(the ratio of two key constituents)and d (the number of days of culturing).Data shown are sij and nij.the number of successful cultures and the number of cultures attempted,at the ith level of r and the jth level of d di d2 ds da ds Total ri 5/31 3/28 20/45 24/47 29/35 81/186 T2 15/77 36/78 43/71 56/71 66/74 216/371 Ts 48/126 68/116 145/171 98/119 114/129 473/661 TA 29/92 35/52 57/85 38/50 7277 231/356 T5 11/53 20/52 20/48 40/55 52/61 143/269 Total 108/379 162/326 285/420 256/342 333/376 1144/1843 ber of successful cultures,compared to the number not too bad in this case,although better perfor- attempted. mance might have been expected with n=1,843 We suppose that the number of successful cul- data points.In fact it is very difficult to guess a pri- tures is a binomial variate, ori what constitutes a large enough sample size for sij~ii.d.binomial(nij,mi), adequate standard-interval performance. (4.15) The ABC formulas (4.13)-(4.14)were derived as i,j=1,2,3,4,5, second-order approximations to the BCa endpoints by DiCiccio and Efron (1992).They showed that with an additive logistic regression model for the these formulas give second-order accuracy as in unknown probabilities Tij, (2.10),and also second-order correctness.Section 8 reviews some of these results.There are many other expressions for ABC-like interval endpoints T (4.16) that enjoy equivalent second-order properties in theory,although they may be less dependable in ∑a-∑B=0. practice.A particularly simple formula is 1 For the example here we take the parameter of in- (4.20)0ABc[a]=0sTAN[a]+o+(2a+g)z(). terest to be This shows that the ABC endpoints are not just a (4.17) 6=15 T51 translation of @srAN[a]. In repeated sampling situations the estimated the success probability for the lowest r and highest d divided by the success probability for the highest constants (a,20,ca)are of stochastic order 1/n in the sample size,the same as a.They multiply a r and lowest d.This typifies the kind of problem traditionally handled by the standard method. in (4.20),resulting in corrections of order /vn to A logistic regression program calculated maxi- @sTAN[a].If there were only 1/4 as much cell data, mum likelihood estimates @:Bi,from which we n=461,but with the same proportion of successes in every cell of Table 4,then (a,2,would be obtained twice as large.This would double the relative dif- (4.18) i=1+exp[-(位+a+月1l =4.16. ference (0ABc[a]-0sTAN[a])/G according to (4.20), 1+exp[-(i+a1+B5)] rendering 0sTAN[a]quite inaccurate. The output of the logistic regression program pro- Both a and 2o are transformation invariant,re- vided and n for the ABC algorithm.Section 3 taining the same numerical value under monotone of DiCiccio and Efron(1992)gives the exact speci- parameter transformations =m().The nonlin- fication for an ABC analysis of a logistic regression earity constant co is not invariant,and it can be problem.Applied here,the algorithm gave standard reduced by transformations that make o more lin- and ABC 0.90 central intervals for 0, ear as a function of u.Changing parameters from 0=T15/πs1to中=log()changes(a,o,cg)from (6sTAw[0.05],sTAx[0.95])=(3.06,5.26), (-0.006,-0.025,0.105)to(-0.006,-0.025,0.025) 4.19) (ABcl0.05],ABC[0.95])=(3.20,5.43). for the cell data.The standard intervals are nearly correct on the d scale.The ABC and BC methods The ABC limits are shifted moderately upwards automate this kind of data-analytic trick. relative to the standard limits,enough to make the We can visualize the relationship between the shape (1.6)equal 1.32.The standard intervals are BC and ABC intervals in terms of Figure 3.The198 T. J. DICICCIO AND B. EFRON Table 4 Cell data: 1,843 cell cultures were prepared, varying two factors, r (the ratio of two key constituents) and d (the number of days of culturing). Data shown are sij and nij ; the number of successful cultures and the number of cultures attempted, at the ith level of r and the jth level of d d1 d2 d3 d4 d5 Total r1 5/31 3/28 20/45 24/47 29/35 81/186 r2 15/77 36/78 43/71 56/71 66/74 216/371 r3 48/126 68/116 145/171 98/119 114/129 473/661 r4 29/92 35/52 57/85 38/50 72/77 231/356 r5 11/53 20/52 20/48 40/55 52/61 143/269 Total 108/379 162/326 285/420 256/342 333/376 1144/1843 ber of successful cultures, compared to the number attempted. We suppose that the number of successful cul￾tures is a binomial variate, (4.15) sij ∼i:i:d: binomialnij ;πij ‘; i; j = 1; 2; 3; 4; 5; with an additive logistic regression model for the unknown probabilities πij , 4:16‘ log πij 1 − πij  = µ + αi + βj ; X 5 1 αi = X 5 1 βj = 0: For the example here we take the parameter of in￾terest to be 4:17‘ θ = π15 π51 ; the success probability for the lowest r and highest d divided by the success probability for the highest r and lowest d. This typifies the kind of problem traditionally handled by the standard method. A logistic regression program calculated maxi￾mum likelihood estimates µˆ; αˆ i ;βˆ j , from which we obtained 4:18‘ θˆ = 1 + exp’−µˆ + αˆ5 + βˆ 1 1 + exp’−µˆ + αˆ1 + βˆ 5 ‘ = 4:16: The output of the logistic regression program pro￾vided µˆ, 6ˆ and ηˆ for the ABC algorithm. Section 3 of DiCiccio and Efron (1992) gives the exact speci- fication for an ABC analysis of a logistic regression problem. Applied here, the algorithm gave standard and ABC 0.90 central intervals for θ, 4:19‘ θˆ STAN’0:05; θˆ STAN’0:95‘ = 3:06; 5:26‘; θˆ ABC’0:05; θˆ ABC’0:95‘ = 3:20; 5:43‘: The ABC limits are shifted moderately upwards relative to the standard limits, enough to make the shape (1.6) equal 1.32. The standard intervals are not too bad in this case, although better perfor￾mance might have been expected with n = 1; 843 data points. In fact it is very difficult to guess a pri￾ori what constitutes a large enough sample size for adequate standard-interval performance. The ABC formulas (4.13)–(4.14) were derived as second-order approximations to the BCa endpoints by DiCiccio and Efron (1992). They showed that these formulas give second-order accuracy as in (2.10), and also second-order correctness. Section 8 reviews some of these results. There are many other expressions for ABC-like interval endpoints that enjoy equivalent second-order properties in theory, although they may be less dependable in practice. A particularly simple formula is 4:20‘ θˆ ABC’α := θˆ STAN’α + σˆ zˆ0 + 2aˆ + cˆq ‘z α‘ 2 •: This shows that the ABC endpoints are not just a translation of θˆ STAN’α. In repeated sampling situations the estimated constants aˆ; zˆ0 ; cˆq ‘ are of stochastic order 1/ √ n in the sample size, the same as σˆ . They multiply σˆ in (4.20), resulting in corrections of order σˆ / √ n to θˆ STAN’α. If there were only 1/4 as much cell data, n = 461, but with the same proportion of successes in every cell of Table 4, then aˆ; zˆ0 ; cˆq ‘ would be twice as large. This would double the relative dif￾ference θˆ ABC’α − θˆ STAN’α‘/σˆ according to (4.20), rendering θˆ STAN’α quite inaccurate. Both aˆ and zˆ0 are transformation invariant, re￾taining the same numerical value under monotone parameter transformations φ = mθ‘. The nonlin￾earity constant cˆq is not invariant, and it can be reduced by transformations that make φ more lin￾ear as a function of µ. Changing parameters from θ = π15/π51 to φ = logθ‘ changes aˆ; zˆ0 ; cˆq ‘ from −0:006; −0:025; 0:105‘ to −0:006; −0:025; 0:025‘ for the cell data. The standard intervals are nearly correct on the φ scale. The ABC and BCa methods automate this kind of data-analytic trick. We can visualize the relationship between the BCa and ABC intervals in terms of Figure 3. The
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有