CHAPTER 4 FINITE-SAMPLE PROPERTIES OF THE LSE Chapter 4 Finite-Sample properties of the LSE Finnite-sample the n is assumed to be fixed normal dist n assumed Large-sample theory n is sent to oo, general distn assumed 4.1 Unbiasedness Write (XX Xy=(XX)X(XB+e) 6+(XXX The E(bIX)=B+E(X'X)XEl B+(X'X)XE(EX) Therefore E Er E[X= Er[BI center of the true parameter distribution b vect or 4. 2 The variance of the lse and the gaussMarkov theorem The Ols est imator of B is b=(X XX (X'X)X' is an k x n vector. Thus each element of b can be written as a linear combination of 3, .. 3. We call b a linear estimat or for this reason The covariance matrix of b is V (bX) E[(b-B)(b-B)IX E(X'X)XEEX(XX)IX (X'XXE(EEXX(XX) (xX)-x(21)X(Xx) Consider an arbitrary linear est imator of B, bo= Cy where C is a k x n matrix. For bo to be unbiased, we should have E(CXB+CEX)
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 1 Chapter 4 Finite—Sample Properties of the LSE Finnite—sample theory : n is assumed to be fixed, normal distn assumed Large—sample theory : n is sent to ∞, general distn assumed 4.1 Unbiasedness Write b = (X ′X) −1 X ′ y = (X ′X) −1 X ′ (Xβ + ε) = β + (X ′X) −1 X ′ ε. Then E (b|X) = β + E (X ′X) −1 X ′ ε|X = β + (X ′X) −1 XE (ε|X) = β. Therefore E (b) = Ex {E [b|X]} = Ex [β] = β. center of the true parameter distribution b vector 4.2 The variance of the LSE and the Gauss—Markov theorem The OLS estimator of β is b = (X ′X) −1 X ′ y. (X′X) −1 X′ is an k × n vector. Thus each element of b can be written as a linear combination of y1, · · · , yn. We call b a linear estimator for this reason. The covariance matrix of b is V ar (b|X) = E (b − β) (b − β) ′ |X = E (X ′X) −1 X ′ εε′X (X ′X) −1 |X = (X ′X) −1 X ′E (εε′ |X) X (X ′X) −1 = (X ′X) −1 X ′ σ 2 I X (X ′X) −1 = σ 2 (X ′X) −1 Consider an arbitrary linear estimator of β, b0 = Cy where C is a k × n matrix. For b0 to be unbiased, we should have E (Cy|X) = E (CXβ + Cε|X) = β.
CHAPTER 4 FINITE-SAMPLE PROPERTIES OF THE LSE For this to hold CX=I The covariance matrix of bo is Var [box=go Now let D=C-(XxX. Since CX=I, DX CX-(XX)XX CX-l [box D+(X'X)X(D+(X'X 2DD Var[bx]+aDD Since DD is a nonnegative definite matrix, Var[box That is for any vector a aVar{boXa≥ aVar [bx]a That is the gauss-Markov theorem given X Since()holds for every particular X Var(b)≤Var(b) This is the uncondition version of the gauss-Markov theorem Note that b=Ex[Var(bX) E(bX) Er[Var(bX) Er(xX)
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 2 For this to hold, CX = I. The covariance matrix of b0 is V ar [b0|X] = σCC′ . Now let D = C − (X′X) −1 X′ . Since CX = I, DX = CX − (X ′X) −1 X ′X = CX − I = 0. Using this gives V ar [b0|X] = σ 2 D + (X ′X) −1 X ′ D + (X ′X) −1 X ′ ′ = σ 2 (X ′X) −1 + σ 2DD′ = V ar [b|X] + σ 2DD′ . Since DD′ is a nonnegative definite matrix, V ar [b0|X] ≥ V ar [b|X] . (*) That is, for any vector a, a ′V ar [b0|X] a ≥ a ′V ar [b|X] a. That is the Gauss—Markov theorem given X. Since (*) holds for every particular X, V ar (b) ≤ V ar (b0.) This is the uncondition version of the Gauss—Markov theorem. Note that V ar [b] = Ex [V ar (b|X)] + V arx [E (b|X)] = Ex [V ar (b|X)] = Ex σ 2 (X ′X) −1 = σ 2Ex (X ′X) −1 .
CHAPTER 4 FINITE-SAMPLE PROPERTIES OF THE LSE 4.3 Estimating the variance of the least squares estimator E(E), a natural estimat or of 02is But t his estimator is biased as discussed now Since Th FleeT Etr(MEEIX (ME(EELX)) tr(M)=tr In-X(X'X) tr(In)-tr((XX)X'X (In)-tr(Ik Therefore Elem=(n-k) and an unbiased estimator of a2 ot The estimator s" is also unbiased unconditionally, because E[2]=E{E[sx}=E(o2)=a sing s?, we obtain an estimator of Var[IX The standard error of the estimat or bk is
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 3 4.3 Estimating the variance of the least squares estimator Since σ 2 = E (ε 2 i ), a natural estimator of σ 2 is σˆ 2 = 1 n n i=1 e 2 i = 1 n e ′ e. But this estimator is biased as discussed now. Since e = My = M (Xβ + ε) = Mε, e ′ e = ε ′Mε. Thus E [e ′ e|X] = E [ε ′Mε|X] = E [tr (ε ′Mε)|X] = E [tr (Mεε′ )|X] = tr (ME (εε′ |X)) = tr Mσ2 I = σ 2 tr (M). But tr (M) = tr In − X (X ′X) −1 X ′ = tr (In) − tr (X ′X) −1 X ′X = tr (In) − tr (IK) = n − K. Therefore, E [e ′ e|M] = (n − K) σ 2 . and an unbiased estimator of σ 2 is s 2 = e ′ e n − K , not σˆ 2 . The estimator s 2 is also unbiased unconditionally, because E s 2 = Ex E s 2 |X = Ex σ 2 = σ 2 . Using s 2 , we obtain an estimator of V ar [b|X] V ar [b|X] = s 2 (X ′X) −1 . The standard error of the estimator bk is s 2 (X′X) −1 kk
(+ R4 FINI-S)M+L +RO+ R ISOFI( LS 4.4 Inference under a normality assumption Assume e n Ngala Then bX C BPEXA XEX N(B.G2HXAXXHXA C NB,OXA Recall that A~N(E,A(021)4) Each element of b X is normally distributed bEIXNN(BK, 02H'XAK Consider the null hy pothesis Ho IBr. C BE The t-test for t his null hy pot hesis is defined by XAk Under the normality assumption NNIF4A Va2BX乐k In addition C Furthermore C EXA X is independent of 匝-KA2
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 4 4.4 Inference under a normality assumption (i) t−test Assume ε ∼ N (0, σ 2 I). Then b|X = β + (X ′X) −1 X ′ ε|X ∼ N β, σ2 (X ′X) −1 X ′X (X ′X) −1 = N β, σ2 (X ′X) −1 . Recall that Aε ∼ N 0, A σ 2 I A ′ . Each element of b|X is normally distributed bk|X ∼ N βk , σ2 (X ′X) −1 kk . Consider the null hypothesis H0 : βk = β 0 k . The t−test for this null hypothesis is defined by tk = bk − β 0 k s 2 (X′X) −1 kk . Under the normality assumption, bk − β 0 k σ 2 (X′X) −1 kk ∼ N (0, 1). In addition (n − K) s 2 σ2 = e ′ e σ2 = ε σ ′ M ε σ ∼ χ 2 tr(M) = χ 2 n−K. Furthermore, b − β σ = (X ′X) −1 X ′ ε σ is independent of (n − K) s 2 σ 2
CHi nt e R s a hi o nre n Ry ne Rt re y a t He r he t fo ow b Co(e,b压)=Ee(bRB)压 E(IRP)eeX( XE I(IRPX(XX Cov(e,b)=0 dd d、1c、ofex(b,(b、 过刀)) fiLC 0, of e. SnpT or BR12 bk R B RK&/n R 8(X'X nR K d gr、、off、d N(0.1 i tk (0, d duc Rt IR 1)of丘、don,E、00 tRd. r b FBk F bR+ ta/2Sbg (ii)FRtest Co.、dr Ho: RB JTK x R fu row r T FR for uld、fi,,d
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 5 This follows because Cov (e, b|X) = E e (b − β) ′ |X = E (I − P) ee′X (X ′X) −1 |X = σ 2 I (I − P) X (X ′X) −1 = 0 which implies Cov (e, b) = 0 and independence of e and b, and because s 2 is a function of e. (See also Theorem B−12). Therefore, tk = bk − β 0 k / σ 2 (X′X) −1 kk (n − K) s 2 σ2 / (n − K) = bk − β 0 k s 2 (X′X) −1 kk has Student’s t−distribution with n − K degrees of freedom. Recall N (0, 1) χ 2 k /k ∼ tk when N (0, 1) and χ 2 k are independent. We deduce from the distribution of tk P bk − tα/2sbk ≤ βk ≤ bk + tα/2sbk = 1 − α where sbk = s 2 (X′X) −1 kk and tα/2 is the critical value from the t−distribution with (n − K) degrees of freedom. The 100(1-α)% confidence interval for βk is bk − tα/2sbk ≤ βk ≤ bk + tα/2sbk . (ii) F−test Consider the null hypothesis H0 : Rβ = r where the J × K matrix R has full row rank. The F−test for this null is defined as F = (Rb − r) ′ R (X ′X) −1 R ′ −1 (Rb − r) /Js2
CHAPTER 4 FINITE-SAMPLE PROPERTIES OF THE LSE The null distribution of F is F(, n-k) Example 1 let K= 2 and Ho B2=0 agR=(1 -1) and r=0, we have F=(h1-b2)(1-1)(xX)-1 F(1,n-2) The null distribut ion follows because R OR(XXR (xx2)(b-n)~N(0a2) K)s2 3. Rb and s" are independent
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 6 The null distribution of F is F (J, n − K). Example 1 Let K = 2 and H0 : β1 − β2 = 0. Taking R = 1 −1 and r = 0, we have F = (b1 − b2) 1 −1 (X ′X) −1 1 −1 −1 −1 (b1 − b2) /s 2 ∼ F (1, n − 2). The null distribution follows because 1. Rb − r ∼ N 0, σ2R (X ′X) −1 R ′ or R (X ′X) −1 R ′ −1/2 (Rb − r) ∼ N 0, σ2 I 2. (n − K) s 2 σ 2 ∼ χ 2 n−K 3. Rb and s 2 are independent.