正在加载图片...
no thought is given to the particular case,and the tester's state of mind,or his capacity for learning,is inoperative."(Fisher 1955,p.73-4). 2.2 Neyman and Pearson Hypothesis Testing Neyman and Pearson (1928a,1928b,1933b,1936a)reject Fisher's idea that only the null hypothesis needs to be tested.They argue that a more useful procedure is to propose two complementary hypotheses:ΘAandΘB(or a class ofΘBi),which need not be labeled“nmll or "alternative"but often are purely for convenience.Furthermore,Neyman and Pearson (1933b)point out that that one can posit a hypothesis and consecutively test multiple ad- missible alternatives against this hypothesis.Since there are now two competing hypotheses in any one test,Neyman and Pearson can define an a priori selected a,the probability of falsely rejecting a under the assumption that Ho is true,and B,the probability of failing to reject e when Ho is false.By convention,the first mistake is called a Type I error,and the second mistake is called a Type II error.Note that a and B are probabilities conditional on two mutually exclusive events:a is conditional on the null hypothesis being true,and B is conditional on the null hypothesis being false.A more useful quantity than B is 1-B,which Neyman and Pearson(1933a,1936a)call the power of the test:the long run probability of accurately rejecting a false null hypothesis given a point alternative hypothesis. In this construct it is desirable to develop the test which has the highest power for a given a priori a.To accomplish this goal,the researcher considers the fixed sample size,the desired significance level,and the research hypothesis,then employs the test with the greatest power.Neyman and Pearson's famous lemma(1936b)shows that under certain conditions there exists a "uniformly most powerful"test which has the greatest possible probability of rejecting a false null hypothesis in favor of a point alternative hypothesis,compared to other tests.A sufficient condition is that the probability density tested has a monotone likelihood ratio.Suppose we have family of probability density functions h(t)in which the random variable t is conditional on some unknown 0 value to be tested.This family has a monotone likelihood ratio if for every then:sanon-deereasing fuction of the random variable t.Suppose further that we perform a test such as Ho:01<02 versus H1:01>02 (02 a known constant),where t is a sufficient statistic for 01,and h(t)has a monotone likelihood ratio.The Karlin-Rubin Theorem (1956)states that if we set a=P(t to)and reject Ho for an observed t>to (to a known constant),then this test has the most power relative to any other possible test of Ho with this a level (Casella and Berger 1990:366-70 Lehmann 1986:78). To contrast the Neyman-Pearson approach with Fisher's test of significance,note how different the the following steps are from Fisher's: 1.Identify a hypothesis of interest,eB,and a complementary hypothesis,A. 2.Determine the appropriate test statistic and its distribution under the assumption that A is true. 3.Specify a significance level (a),and determine the corresponding critical value of the test statistic under the assumption that A is true. 4.Calculate the test statistic from the data. 3no thought is given to the particular case, and the tester’s state of mind, or his capacity for learning, is inoperative.” (Fisher 1955, p.73-4). 2.2 Neyman and Pearson Hypothesis Testing Neyman and Pearson (1928a, 1928b, 1933b, 1936a) reject Fisher’s idea that only the null hypothesis needs to be tested. They argue that a more useful procedure is to propose two complementary hypotheses: ΘA and ΘB (or a class of ΘBi), which need not be labeled “null” or “alternative” but often are purely for convenience. Furthermore, Neyman and Pearson (1933b) point out that that one can posit a hypothesis and consecutively test multiple ad￾missible alternatives against this hypothesis. Since there are now two competing hypotheses in any one test, Neyman and Pearson can define an a priori selected α, the probability of falsely rejecting ΘA under the assumption that H0 is true, and β, the probability of failing to reject ΘA when H0 is false. By convention, the first mistake is called a Type I error, and the second mistake is called a Type II error. Note that α and β are probabilities conditional on two mutually exclusive events: α is conditional on the null hypothesis being true, and β is conditional on the null hypothesis being false. A more useful quantity than β is 1−β, which Neyman and Pearson (1933a, 1936a) call the power of the test: the long run probability of accurately rejecting a false null hypothesis given a point alternative hypothesis. In this construct it is desirable to develop the test which has the highest power for a given a priori α. To accomplish this goal, the researcher considers the fixed sample size, the desired significance level, and the research hypothesis, then employs the test with the greatest power. Neyman and Pearson’s famous lemma (1936b) shows that under certain conditions there exists a “uniformly most powerful” test which has the greatest possible probability of rejecting a false null hypothesis in favor of a point alternative hypothesis, compared to other tests. A sufficient condition is that the probability density tested has a monotone likelihood ratio. Suppose we have family of probability density functions h(t|θ) in which the random variable t is conditional on some unknown θ value to be tested. This family has a monotone likelihood ratio if for every θ1 > θ2, then: h(t|θ1) h(t|θ2) is a non-decreasing function of the random variable t. Suppose further that we perform a test such as H0: θ1 ≤ θ2 versus H1: θ1 > θ2 (θ2 a known constant), where t is a sufficient statistic for θ1, and h(t|θ1) has a monotone likelihood ratio. The Karlin-Rubin Theorem (1956) states that if we set α = P(t > t0) and reject H0 for an observed t > t0 (t0 a known constant), then this test has the most power relative to any other possible test of H0 with this α level (Casella and Berger 1990: 366-70, Lehmann 1986: 78). To contrast the Neyman-Pearson approach with Fisher’s test of significance, note how different the the following steps are from Fisher’s: 1. Identify a hypothesis of interest, ΘB, and a complementary hypothesis, ΘA. 2. Determine the appropriate test statistic and its distribution under the assumption that ΘA is true. 3. Specify a significance level (α), and determine the corresponding critical value of the test statistic under the assumption that ΘA is true. 4. Calculate the test statistic from the data. 3
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有