Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher's work on estimation. As in estimation, we begin by postulating a statistical model but instead of seeking an estimator of 6 in e we consider the question whether ∈eoc6or6∈61=6-6 Do is most supported by the observed data. The discussion which follows will proceed in a similar way, though less systematically and formally, to the discussion of estimation. This is due to the complexity of the topic which arises mainly because one is asked to assimilate too many con- cepts too quickly just to be able to define the problem properly. This difficulty, however, is inherent in testing, if any proper understanding of the topic is to be attempted, and thus unavoidable 1 Testing: Definition and Concepts 1.1 The Decision rule Let X be a random variables defined on the probability space(S, F, P())and consider the statistical model associated with X (a)Φ={f(x;0),0∈6} (b)x=(X1, X2, .,Xn)' is a random sample from f(a: 0) The problem of hypothesis testing is one of deciding whether or not some conjectures about 0 of the form 0 belongs to some subset Oo of e is supported by the data a=(1, T2, ..,n. We call such a conjecture the null hypothesis and denoted it by o:∈ where if the sample realization a E Co we accept Ho, if a E Ci we reject it Since the observation space x E R, but both the acceptance region CO E R and rejection region C1 R, we need a mapping from R to R. The mapping which enables us to define Co and C we call a test statistic T(x):x-R
Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher’s work on estimation. As in estimation, we begin by postulating a statistical model but instead of seeking an estimator of θ in Θ we consider the question whether θ ∈ Θ0 ⊂ Θ or θ ∈ Θ1 = Θ − Θ0 is most supported by the observed data. The discussion which follows will proceed in a similar way, though less systematically and formally, to the discussion of estimation. This is due to the complexity of the topic which arises mainly because one is asked to assimilate too many concepts too quickly just to be able to define the problem properly. This difficulty, however, is inherent in testing, if any proper understanding of the topic is to be attempted, and thus unavoidable. 1 Testing: Definition and Concepts 1.1 The Decision Rule Let X be a random variables defined on the probability space (S, F,P(·)) and consider the statistical model associated with X: (a) Φ = {f(x; θ), θ ∈ Θ}; (b) x = (X1, X2, ..., Xn)’ is a random sample from f(x; θ). The problem of hypothesis testing is one of deciding whether or not some conjectures about θ of the form θ belongs to some subset Θ0 of Θ is supported by the data x = (x1, x2, ..., xn) 0 . We call such a conjecture the null hypothesis and denoted it by H0 : θ ∈ Θ0, where if the sample realization x ∈ C0 we accept H0, if x ∈ C1 we reject it. Since the observation space X ∈ R n , but both the acceptance region C0 ∈ R 1 and rejection region C1 ∈ R 1 , we need a mapping from R n to R 1 . The mapping which enables us to define C0 and C1 we call a test statistic τ (x) : X → R 1 . 1
Exam Let X be the random variables representing the marks achieved by students in an econometric theory paper an let the statistical model b )0={(G0=如(=) 6∈6 (b)x=(X1, X2,, Xn, n=40 is random sample from p The hypothesis to be tested is H0:6=60(e.X~N(60,64),Oo={60} against H1:≠60(ie.X~N(p,64),p≠60),O1=[0.,100-{60} or the sample realization t takes a value 'around60 then we will be inclined, Common sense suggests that if some 'good'estimator of 8, say Xn=(1/n)isai accept Ho. Let us formalise this argument The accept region takes the form:60-ε≤Xn≤60+e,E>0,or C0={x:|Xn-60≤e} and C1=a: Xn-602e, is the rejection region Formally,rifr∈Cl( reject Ho)andb∈eo( Ho is true)- type I error;ifc∈Co (accept Ho) and 0 E O1(Ho is false)-type II error. The hypothesis to be tested is formally stated as follows Ho:6∈60,eos. Against the null hypothesis Ho we postulate the alternative Hi which takes the
Example: Let X be the random variables representing the marks achieved by students in an econometric theory paper an let the statistical model be: (a) Φ = n f(x; θ) = 1 8 √ 2π exp h − 1 2 x−θ 8 2 io , θ ∈ Θ ≡ [0, 100]; (b) x = (X1, X2, ..., Xn) 0 , n=40 is random sample from Φ. The hypothesis to be tested is H0 : θ = 60 (i.e. X ∼ N(60, 64)), Θ0 = {60} against H1 : θ 6= 60 (i.e. X ∼ N(µ, 64), µ 6= 60), Θ1 = [0, 100] − {60}. Common sense suggests that if some ’good’ estimator of θ, say X¯ n = (1/n) Pn i=1 xi for the sample realization x takes a value ’around’ 60 then we will be inclined to accept H0. Let us formalise this argument: The accept region takes the form: 60 − ε ≤ X¯ n ≤ 60 + ε, ε > 0, or C0 = {x : |X¯ n − 60| ≤ ε} and C1 = {x : |X¯ n − 60| ≥ ε}, is the rejection region. Formally, if x ∈ C1 (reject H0) and θ ∈ Θ0 (H0 is true)–type I error; if x ∈ C0 (accept H0) and θ ∈ Θ1 (H0 is false)–type II error. The hypothesis to be tested is formally stated as follows: H0 : θ ∈ Θ0, Θ0 ⊆ Θ. Against the null hypothesis H0 we postulate the alternative H1 which takes the form: H1 : θ ∈ Θ1 ≡ Θ − Θ0. 2
It is important to note at the outset that Ho and Hi are in effect hypothesis about the distribution of the sample f(a, 0), i.e Ho:f(x,0).6∈60,H1:f(x,θ).6∈61 In testing a null hypothesis Ho against an alternative Hi the issue is to decide whether the sample realization a 'support' Ho or H1. In the former case we say that Ho is accepted, in the latter Ho is rejected. In order to be able te make such a decision we need to formulate a mapping which related eo to some subset of the observation space say Co, we call an acceptance region, and ts complement C1(Co∪C1=礼,ConC1=∞) we call the rejection region
It is important to note at the outset that H0 and H1 are in effect hypothesis about the distribution of the sample f(x, θ), i.e. H0 : f(x, θ). θ ∈ Θ0, H1 : f(x, θ). θ ∈ Θ1. In testing a null hypothesis H0 against an alternative H1 the issue is to decide whether the sample realization x ’support’ H0 or H1. In the former case we say that H0 is accepted, in the latter H0 is rejected. In order to be able to make such a decision we need to formulate a mapping which related Θ0 to some subset of the observation space X , say C0, we call an acceptance region, and its complement C1 (C0 ∪ C1 = X , C0 ∩ C1 = ∅) we call the rejection region. 3
1.2 Type I and Type II Errors The next question is"how do we choose E " If e is to small we run the risk of rejecting Ho when it is true; we call this type i error. On the other hand if e is too large we run the risk of accepting Ho when it is false; we call this type ii error. That is, if we were to choose e too small we run a higher risk of committing a type i error than of committing a type il error and vice versa That is, there is a trade off between the probability of type i error, i.e Pr(x∈C1;b∈60)=a, and the probability B of type Ii error, i.e Pr(x∈Co;6∈1)=. Ideally we would like a= 6=0 for all oEe which is not possible for a fixed n. Moreover we cannot control both simultaneously because of the trade-off between them. The strategy adopted in hypothesis testing where a small value of a is chosen and for a given a, B is minimized. Formally, this amounts to choose a* such that Pr(x∈C1;b∈60)=a(6)≤afo6∈6 and Pr(x∈Co;b∈O1)=B(0), is minimized for 8∈61 y choosing C1 or Co appropriately. In the case of the above example if we were to choose a, say a*=0.05, then Pr(Xn-60|>;6=60)=0.05 How do we determine E, then? " The only random variable involved in the tatement X and hence it has to be its sampling distribution. For the above
1.2 Type I and Type II Errors The next question is ”how do we choose ε ?” If ε is to small we run the risk of rejecting H0 when it is true; we call this type I error. On the other hand, if ε is too large we run the risk of accepting H0 when it is false; we call this type II error. That is, if we were to choose ε too small we run a higher risk of committing a type I error than of committing a type II error and vice versa. That is, there is a trade off between the probability of type I error, i.e. Pr(x ∈ C1; θ ∈ Θ0) = α, and the probability β of type II error, i.e. Pr(x ∈ C0; θ ∈ Θ1) = β. Ideally we would like α = β = 0 for all θ ∈ Θ which is not possible for a fixed n. Moreover we cannot control both simultaneously because of the trade-off between them. The strategy adopted in hypothesis testing where a small value of α is chosen and for a given α, β is minimized. Formally, this amounts to choose α ∗ such that Pr(x ∈ C1; θ ∈ Θ0) = α(θ) ≤ α ∗ for θ ∈ Θ0, and Pr(x ∈ C0; θ ∈ Θ1) = β(θ), is minimized for θ ∈ Θ1 by choosing C1 or C0 appropriately. In the case of the above example if we were to choose α, say α ∗ = 0.05, then Pr(|X¯ n − 60| > ε; θ = 60) = 0.05. ”How do we determine ε, then ?” The only random variable involved in the statement is X¯ and hence it has to be its sampling distribution. For the above 4
probabilistic statement to have any operational meaning to enable us to determine E, the distribution of Xn must be known. In the present case we know that 64 where which implies that for 0=60,(i.e. when Ho is true)we can ' construct'a test statistic T(x) from sample x such that /1.6 、√1.6 1.265 N(0,1), and thus the distribution of r( is known completely(no unknown parame- ters). When this is the case this distribution can be used in conjunction with the above probabilistic statement to determine E. In order to do this we need to relate IXn-60 to r(x)(a statistics)for which the distribution is known. The obvious way is to standardize the former. This suggests change the above probabilistic statement to the equivalent statement X 0.05 where The value of ca given from the N(o, 1)table is ca=1.96. This in turn implies that the rejection region for the test is xn-60>1.96}={x:|r(x)≥ 1.265 C1={x:|Xn-60≥2.48} That is, for sample realization x which give rise to Xn falling outside the in- terval(57.52, 62.48)we reject Ho Let us summarize the argument so far. We set out to construct a test for Ho: 6=60 against H1: 6 and intuition suggested the rejection region (Xn-60 8). In order to determine e we have
probabilistic statement to have any operational meaning to enable us to determine ε, the distribution of X¯ n must be known. In the present case we know that X¯ n ∼ N θ, σ 2 n where σ 2 n = 64 40 = 1.6, which implies that for θ = 60, (i.e. when H0 is true) we can ’construct’ a test statistic τ (x) from sample x such that τ (x) = X¯ n − θ √ 1.6 = X¯ n − 60 √ 1.6 = X¯ n − 60 1.265 ∼ N(0, 1), and thus the distribution of τ (·) is known completely (no unknown parameters). When this is the case this distribution can be used in conjunction with the above probabilistic statement to determine ε. In order to do this we need to relate |X¯ n − 60| to τ (x) (a statistics) for which the distribution is known. The obvious way is to standardize the former. This suggests change the above probabilistic statement to the equivalent statement Pr |X¯ n − 60| 1.265 ≥ cα; θ = 60 = 0.05 where cα = ε 1.265 . The value of cα given from the N(0, 1) table is cα = 1.96. This in turn implies that the rejection region for the test is C1 = x : |X¯ n − 60| 1.265 ≥ 1.96 = {x : |τ (x)| ≥ 1.96} or C1 = {x : |X¯ n − 60| ≥ 2.48}. That is, for sample realization x which give rise to X¯ n falling outside the interval (57.52, 62.48) we reject H0. Let us summarize the argument so far. We set out to construct a test for H0 : θ = 60 against H1 : θ 6= 60 and intuition suggested the rejection region (|X¯ n − 60| ≥ ε). In order to determine ε we have to 5
(a) Chose an a; and then (b)define the rejection region in terms some statistic T(x) The latter is necessary to enable us to determine e via some known distribution This is the distribution of the test statistic T(x)under Ho(i.e. when Ho is true The next question which naturally arise is: What do we need the probability of type Ii error B for? The answer is that we need b to decide whether the test defined in terms of Cl(of course Co) is a'good'or a ' test. As we mentioned t the outset, the way we decided to solve the problem of the trade-off between a and B was to choose a given small value of a and define C1 so as to minimize 6. At this stage we do not know whether the test defined above is a 'good'test or not. Let us consider setting up the apparatus to enable us to consider the question of optimality
(a) Chose an α; and then (b) define the rejection region in terms some statistic τ (x). The latter is necessary to enable us to determine ε via some known distribution. This is the distribution of the test statistic τ (x) under H0 (i.e. when H0 is true). The next question which naturally arise is: ”What do we need the probability of type II error β for ?” The answer is that we need β to decide whether the test defined in terms of C1(of course C0) is a ’good’ or a ’bad’ test. As we mentioned at the outset, the way we decided to solve the problem of the trade-off between α and β was to choose a given small value of α and define C1 so as to minimize β. At this stage we do not know whether the test defined above is a ’good’ test or not. Let us consider setting up the apparatus to enable us to consider the question of optimality. 6
2 Optimal tests First we note that minimization of Pr(x E Co) for all 0 E e1 is equivalent to maximizing pr(x∈Cn) for all 6∈61 Definition 1 The probability of reject Ho when false at some point 01 E O1, i.e. Pr(x E C1: 0=01) is called the power of the test at 0=01 Note that Pr(x∈C1;b=61)=1-Pr(x∈Co;6=61)=1-B(61) In the above example we can define the power of the test at some 01 Ee sayb=54, to be pr[(|Xn-60)/1.265≥1.96:6=54 Under the alternative hypothesis that 0= 54, then it is true that 1651 N(O, 1). We would like to know that the probability of the statistics constructed under the null hypothesis that xnx60 would fall in the rejection re is, the power of the test at 0=54 to be P 1,265≥1.96;6 +P/.205-196~(54-60) 54 1.265 ≥1.96~(54 60) =0.993 1 1.265 Hence, the power of the test defined by Ci above is indeed very high for 6=54 From this we know that to calculate the power of a test we need to know the distribution of the test statistics T(x) under the alternative hypothesis. In this case it is the distribution of Xn=54 In the example above, the test statistic T(x) have a standard normal distribution under both the null and the alternative hypothesis. However, it is quite often the case when it happen that a test statistics have a different distribution under the null and the alternative hypotheses For example, the unit root test. See Chapter 21
2 Optimal Tests First we note that minimization of Pr(x ∈ C0) for all θ ∈ Θ1 is equivalent to maximizing Pr(x ∈ C1) for all θ ∈ Θ1. Definition 1: The probability of reject H0 when false at some point θ1 ∈ Θ1, i.e. Pr(x ∈ C1; θ = Θ1) is called the power of the test at θ = θ1. Note that Pr(x ∈ C1; θ = θ1) = 1 − Pr(x ∈ C0; θ = θ1) = 1 − β(θ1). In the above example we can define the power of the test at some θ1 ∈ Θ1, say θ = 54, to be Pr[(|X¯ n − 60|)/1.265 ≥ 1.96; θ = 54]. Under the alternative hypothesis that θ = 54, then it is true that X¯n−54 1.265 ∼ N(0, 1). We would like to know that the probability of the statistics constructed under the null hypothesis that X¯n−60 1.265 would fall in the rejection region; that is, the power of the test at θ = 54 to be Pr |X¯ n − 60| 1.265 ≥ 1.96; θ = 54 = Pr |X¯ n − 54| 1.265 ≤ −1.96 − (54 − 60) 1.265 +Pr |X¯ n − 54| 1.265 ≥ 1.96 − (54 − 60) 1.265 = 0.993. Hence, the power of the test defined by C1 above is indeed very high for θ = 54. From this we know that to calculate the power of a test we need to know the distribution of the test statistics τ (x) under the alternative hypothesis. In this case it is the distribution of X¯n−54 1.265 . 1 1 In the example above, the test statistic τ (x) have a standard normal distribution under both the null and the alternative hypothesis. However, it is quite often the case when it happen that a test statistics have a different distribution under the null and the alternative hypotheses. For example, the unit root test. See Chapter 21. 7
Following the same procedure the power of the test defined by C1 is as fol- wing for all e∈白1: Pr(r(X)|≥196:6=56)=0.8849 Pr(r(X)|≥196:6=58)=0.3520; Pr((X)≥1.96:6=60)=0.0500 Pr(r(X)川≥1.96;6=62)=0.3520; Pr(r(X)川≥1.96;6=64)=0.8849 Pr((x)|≥1.96;6=66)=0.9973 As we can see, the power of the test increase as we go further away from 8=60(Ho) and the power at 0=60 equals the probability of type I error. This prompts us to define the power function as follows Definition 2 P(0)=Pr(x E C1), 0E0 is called the power function of the test defined by the rejection region C1 Definition 3 a=matBee P(0) is defined to be the size (or the significance level)of the test In the case where Ho is simple, say 0=8o, then a=P(Bo) Definition 4 A test of Ho: 0 E 0o against H1: 0 E 01 as defined by some rejection region C1 is said to be uniformly most powerful(UMP)test of size a if (a) ma. F(⊙) (b)P(6)≥P(6) for all e∈O1, where P*(0)is the power function of any other test of size a As will be seen in the sequential, no UMP tests exists in most situations of interest in practice. The procedure adopted in such cases is to reduce the class
Following the same procedure the power of the test defined by C1 is as following for all θ ∈ Θ1: Pr(|τ (X)| ≥ 1.96; θ = 56) = 0.8849; Pr(|τ (X)| ≥ 1.96; θ = 58) = 0.3520; Pr(|τ (X)| ≥ 1.96; θ = 60) = 0.0500; Pr(|τ (X)| ≥ 1.96; θ = 62) = 0.3520; Pr(|τ (X)| ≥ 1.96; θ = 64) = 0.8849; Pr(|τ (X)| ≥ 1.96; θ = 66) = 0.9973. As we can see, the power of the test increase as we go further away from θ = 60(H0) and the power at θ = 60 equals the probability of type I error. This prompts us to define the power function as follows. Definition 2: P(θ) = Pr(x ∈ C1), θ ∈ Θ is called the power function of the test defined by the rejection region C1. Definition 3: α = maxθ∈Θ0P(θ) is defined to be the size (or the significance level) of the test. In the case where H0 is simple, say θ = θ0, then α = P(θ0). Definition 4: A test of H0 : θ ∈ Θ0 against H1 : θ ∈ Θ1 as defined by some rejection region C1 is said to be uniformly most powerful (UMP) test of size α if (a) maxθ∈Θ0P(θ) = α; (b) P(θ) ≥ P ∗ (θ) for all θ ∈ Θ1, where P ∗ (θ) is the power function of any other test of size α. As will be seen in the sequential, no UMP tests exists in most situations of interest in practice. The procedure adopted in such cases is to reduce the class 8
of all tests to some subclass by imposing some more criteria and consider the question of UMP tests within the subclass Definition 5 A test of Ho: 0 E Eo against 0 E e1 is said to be unbiased if matese P(0)< mateee, P(0) In other word, a test is unbiased if it reject Ho more often when it is false than when it is true Collecting all the above concepts together we say that a test has been defined when the following components have been specified (a) a test statistic T(x) (b)the size of the test a (c) the distribution of r(x)under Ho and H1 (d)the rejection region C1(or, equivalently, Co) The most important component in defining a test is the test statistics for which we need to know its distribution under both Ho and H1. Hence, construct- ing an optimal test is largely a matter of being able to find a statistic T(x)which should have the following propert (a)T(x) depends on x via agood estimator of 8; and (b) the distribution of T(x)under both Ho and H, does not depend on any un- known parameters. We call such a statistic a pivot Assume a random sample of size 1l is drawn from a normal distribution N(u, 400) In particular,y1=62,y=52,y3=68,y4=23,y=34,36=45,y=27,y= 42, 99=83, 910=56 and y11 =40. Test the null hypothesis that Ho: A=55 versus h1:≠55 Since g is known, the sample mean will distributed as F~N(u,a2/n)≡N(,40011 therefore under Ho Y~N(55,36.36)
of all tests to some subclass by imposing some more criteria and consider the question of UMP tests within the subclass. Definition 5: A test of H0 : θ ∈ Θ0 against θ ∈ Θ1 is said to be unbiased if maxθ∈Θ0P(θ) ≤ maxθ∈Θ1P(θ). In other word, a test is unbiased if it reject H0 more often when it is false than when it is true. Collecting all the above concepts together we say that a test has been defined when the following components have been specified: (a) a test statistic τ (x); (b) the size of the test α; (c) the distribution of τ (x) under H0 and H1; (d) the rejection region C1 (or, equivalently, C0). The most important component in defining a test is the test statistics for which we need to know its distribution under both H0 and H1. Hence, constructing an optimal test is largely a matter of being able to find a statistic τ (x) which should have the following properties: (a) τ (x) depends on x via a ’good’ estimator of θ; and (b) the distribution of τ (x) under both H0 and H1 does not depend on any unknown parameters. We call such a statistic a pivot. Example: Assume a random sample of size 11 is drawn from a normal distribution N(µ, 400). In particular, y1 = 62, y2 = 52, y3 = 68, y4 = 23, y5 = 34, y6 = 45, y7 = 27, y8 = 42, y9 = 83, y10 = 56 and y11 = 40. Test the null hypothesis that H0 : µ = 55 versus H1 : µ 6= 55. Since σ 2 is known, the sample mean will distributed as Y¯ ∼ N(µ, σ 2 /n) ≡ N(µ, 400/11), therefore under H0 : µ = 55, Y¯ ∼ N(55, 36.36) 9
Y-55 (0,1) We accept Ho when the test statistics T(a)=(Y-55)/ 36 36 lying in the in- terval Co=[-1.96, 1.96 under the size of the test a=0.05 We now have 532 gi=532 and y =48.4 Then 48.4-55 1.01 V36.36 which is in the accept region. Therefore we accept the null hypothesis that Ho xaMl Assume a random sample of size 11 is drawn from a normal distribution N(u, a2) In particular,=62,y=52,y3=68,4=23,y5=34,y6=45,yr=27,y= 42, y9=83, 910=56 and y11 =40. Test the null hypothesis that Ho: A= 55 versus h1:≠55 Since a is unknown, the sample mean distributed as Y therefore under Ho: A=55 however it is not a pivotal test statistics since an unknown parameters o2
or Y¯ − 55 √ 36.36 ∼ N(0, 1). We accept H0 when the test statistics τ (x) = (Y¯ − 55)/ √ 36.36 lying in the interval C0 = [−1.96, 1.96] under the size of the test α = 0.05. We now have X 11 i=1 yi = 532 and y¯ = 532 11 = 48.4. Then 48.4 − 55 √ 36.36 = −1.01 which is in the accept region. Therefore we accept the null hypothesis that H0 : µ = 55. Example: Assume a random sample of size 11 is drawn from a normal distribution N(µ, σ 2 ). In particular, y1 = 62, y2 = 52, y3 = 68, y4 = 23, y5 = 34, y6 = 45, y7 = 27, y8 = 42, y9 = 83, y10 = 56 and y11 = 40. Test the null hypothesis that H0 : µ = 55 versus H1 : µ 6= 55. Since σ 2 is unknown, the sample mean distributed as Y¯ ∼ N(µ, σ 2 /n), therefore under H0 : µ = 55 Y¯ ∼ N(55, σ 2 /n) or Y¯ − 55 p σ 2/n ∼ N(0, 1), however it is not a pivotal test statistics since an unknown parameters σ 2 . 10