正在加载图片...
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING-2005 2100 ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS Jeff Sauro James R.Lewis Oracle IBM Denver,CO USA Boca Raton,FL jeff.sauro@oracle.com jimlewis@us.ibm.com The completion rate-the proportion of participants who successfully complete a task-is a common usability measurement.As is true for any point measurement,practitioners should compute appropriate confidence intervals for completion rate data.For proportions such as the completion rate,the appropriate interval is a binomial confidence interval.The most widely-taught method for calculating binomial confidence intervals(the "Wald Method,"discussed both in introductory statistics texts and in the human factors literature)grossly understates the width of the true interval when sample sizes are small. Alternative "exact"methods over-correct the problem by providing intervals that are too conservative. This can result in practitioners unintentionally accepting interfaces that are unusable or rejecting interfaces that are usable.We examined alternative methods for building confidence intervals from small sample completion rates,using Monte Carlo methods to sample data from a number of real,large-sample usability tests.It appears that the best method for practitioners to compute 95%confidence intervals for small- sample completion rates is to add two successes and two failures to the observed completion rate,then compute the confidence interval using the Wald method (the "Adjusted Wald Method").This simple approach provides the best coverage,is fairly easy to compute,and agrees with other analyses in the statistics literature. Introduction Estimating completion rates with small samples is an Task completion rates are often modeled using a binomial distribution because the outcome of a task important and challenging task.Confidence intervals attempt is usually a binomial value (complete /didn't are taught as an appropriate way to qualify results from complete).The Wald interval is simple to compute,has small samples.The addition of confidence intervals to been around for some time (Laplace,1812)and is completion rate estimates helps both the engineer and presented in most introductory statistics texts and some readers of usability reports understand the variability writings in the human factors literature (e.g.,Landauer, inherent in small samples.While the importance of 1988).Unfortunately,it produces intervals that are too adding confidence intervals is widely agreed upon,the narrow when samples are small,especially when the best method for computing them is not. completion rate is not near 50%.Under these Most practitioners interpret a 95%confidence interval conditions its average coverage is approximately 60%, not 95%(Agresti and Coull,1998).This is a real to indicate that in 95 out of 100 experiments,the interval constructed from the sample will contain the problem considering that HF practitioners rely on true value for the population.The extent to which this confidence intervals to have true coverage that is equal is the case for any given method of computing intervals to nominal coverage in the long run. is the“coverage”'for that method To improve the poor average coverage of the Wald interval,advanced statistics texts often present a more The Wald method is the most commonly presented complicated method called the Clopper-Pearson or formula for calculating binomial confidence intervals “Exact'”method(see Figure2 below). (see Figure I below). Figure I:Wald Confidence Interval p±za/2V(1-m)/m,ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS Jeff Sauro Oracle Denver, CO USA jeff.sauro@oracle.com James R. Lewis IBM Boca Raton, FL jimlewis@us.ibm.com The completion rate – the proportion of participants who successfully complete a task – is a common usability measurement. As is true for any point measurement, practitioners should compute appropriate confidence intervals for completion rate data. For proportions such as the completion rate, the appropriate interval is a binomial confidence interval. The most widely-taught method for calculating binomial confidence intervals (the “Wald Method,” discussed both in introductory statistics texts and in the human factors literature) grossly understates the width of the true interval when sample sizes are small. Alternative “exact” methods over-correct the problem by providing intervals that are too conservative. This can result in practitioners unintentionally accepting interfaces that are unusable or rejecting interfaces that are usable. We examined alternative methods for building confidence intervals from small sample completion rates, using Monte Carlo methods to sample data from a number of real, large-sample usability tests. It appears that the best method for practitioners to compute 95% confidence intervals for small￾sample completion rates is to add two successes and two failures to the observed completion rate, then compute the confidence interval using the Wald method (the “Adjusted Wald Method”). This simple approach provides the best coverage, is fairly easy to compute, and agrees with other analyses in the statistics literature. Introduction Estimating completion rates with small samples is an important and challenging task. Confidence intervals are taught as an appropriate way to qualify results from small samples. The addition of confidence intervals to completion rate estimates helps both the engineer and readers of usability reports understand the variability inherent in small samples. While the importance of adding confidence intervals is widely agreed upon, the best method for computing them is not. Most practitioners interpret a 95% confidence interval to indicate that in 95 out of 100 experiments, the interval constructed from the sample will contain the true value for the population. The extent to which this is the case for any given method of computing intervals is the “coverage” for that method. The Wald method is the most commonly presented formula for calculating binomial confidence intervals (see Figure 1 below). Figure 1: Wald Confidence Interval Task completion rates are often modeled using a binomial distribution because the outcome of a task attempt is usually a binomial value (complete / didn’t complete). The Wald interval is simple to compute, has been around for some time (Laplace, 1812) and is presented in most introductory statistics texts and some writings in the human factors literature (e.g., Landauer, 1988). Unfortunately, it produces intervals that are too narrow when samples are small, especially when the completion rate is not near 50%. Under these conditions its average coverage is approximately 60%, not 95% (Agresti and Coull, 1998). This is a real problem considering that HF practitioners rely on confidence intervals to have true coverage that is equal to nominal coverage in the long run. To improve the poor average coverage of the Wald interval, advanced statistics texts often present a more complicated method called the Clopper-Pearson or “Exact” method (see Figure 2 below). PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING—2005 2100
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有