CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING B Chapter 3 Least Squares Methods for Estimating B Methods for estimat ing B Least squares estimation Maximum like lihood estimation Met hod of moments est imation Least a bsolute deviat ion est imation 3.1 Least squares estimation The criterion of the least squares estimation is min∑(-Xb min(y- Xbo)(y-Xbo Let the objective funct ion be S(bo)=(y-Xbo)(y-Xbo)=yy-boX'y-yXbo+B0XXbo yiy-2y/Xbo+box Xbe The first-order condition for the minimization of this funct ion is aS(bo) 2Xy+2XX60=0 The solution of this equat ion is the least squares estimate of the coefficient vector B If rank()=K, rank(X'x)=k. Thus, the inverse of X'X exists Xb. We call this residual vect or. We have X y XXXX I-X(XXX)y (I-P)3, where P=X(xX X. The matrix P is called projection matrix. We also let I-P 1. Then, we may write(2)as b+e=Py+Mg
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 1 Chapter 3 Least Squares Methods for Estimating β Methods for estimating β Least squares estimation Maximum likelihood estimation Method of moments estimation Least absolute deviation estimation . . . 3.1 Least squares estimation The criterion of the least squares estimation is min b0 n i=1 (yi − X ′ i b0) 2 or min b0 (y − Xb0) ′ (y − Xb0). Let the objective function be S (b0) = (y − Xb0) ′ (y − Xb0) = y ′ y − b ′ 0X ′ y − y ′Xb0 + b ′ 0X ′Xb0 = y ′ y − 2y ′Xb0 + b0X ′Xb0. The first—order condition for the minimization of this function is ∂S (b0) ∂b0 = −2X ′ y + 2X ′Xb0 = 0. The solution of this equation is the least squares estimate of the coefficient vector β. b = (X ′X) −1 X ′ y. If rank (X) = K, rank (X′X) = K. Thus, the inverse of X′X exists. Let e = y − Xb. We call this residual vector. We have e = y − Xb (1) = y − X(X ′X) −1X ′ y = (I − X (X ′X) −1 X ′ )y = (I − P)y, (2) where P = X (X′X) −1 X′ . The matrix P is called projection matrix. We also let I −P = M. Then, we may write (2) as y = Xb + e = P y + My
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING B We often write Py=y. This is the part of y that is explained by X Properties of the matrices P and M ar (i)P/=PP= P(idempotent matrix ()M=M,M=M (iii PX=X, MX=0 (iv)PM=0 Using(1) and(iii), we have X X MU=0 If the first column of X is 1=(1,, 1), this relation implies In addition,(iv) gives ay=yPPy+yMMy=yy+e'e 3.2 Partitioned regression and partial regression Consider y=XB+E=X131+X2/2+E The normal equations for b1 and b2 are XiX1 X1X2/b1 Xiy X2X1X2X2八(b2 The first part of these equations are (X1X1)b1+(X1X2)b2=X which gives b1=(X1X1)X1y-(X1X1)X1X2b2 (X1X1)-x1(-X2b2) Plug this into the second part of the normal equations. Then, we have X, X161+X2 X2 b2 X2X1(X1X1)-X1y-X2X1(X1X1)-X1X2b2+2X2b2 X2X1(X1X1)Xly+X2(I-Px)X2b2
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 2 We often write Py = y. ˆ This is the part of y that is explained by X. Properties of the matrices P and M are: (i) P ′ = P, P 2 = P (idempotent matrix) (ii) M′ = M, M2 = M (iii) PX = X, MX = 0 (iv) PM = 0 Using (1) and (iii), we have X ′ e = X ′My = 0. If the first column of X is 1 = (1, · · · , 1)′ , this relation implies X ′ 1 e = n i=1 ei = 0. In addition, (iv) gives y ′ y = y ′P ′P y + y ′M′My = ˆy ′ yˆ + e ′ e 3.2 Partitioned regression and partial regression Consider y = Xβ + ε = X1β1 + X2β2 + ε. The normal equations for b1 and b2 are X′ 1X1 X′ 1X2 X′ 2X1 X′ 2X2 b1 b2 = X′ 1 y X′ 2 y . The first part of these equations are (X ′ 1X1) b1 + (X ′ 1X2) b2 = X ′ 1 y which gives b1 = (X ′ 1X1) −1 X ′ 1 y − (X ′ 1X1) −1 X ′ 1X2b2 = (X ′ 1X1) −1 X ′ 1 (y − X2b2). Plug this into the second part of the normal equations. Then, we have X ′ 2X1b1 + X ′ 2X2b2 = X ′ 2X1 (X ′ 1X1) −1 X ′ 1y − X ′ 2X1 (X ′ 1X1) −1 X ′ 1X2b2 + X ′ 2X2b2 = X ′ 2X1 (X ′ 1X1) −1 X ′ 1 y + X ′ 2 (I − PX1 ) X2b2 = X ′ 2 y.
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING B Thus b2=(X2(I-Px1)X2)-2(I-Px1)y b1=(X1(-Px2)X1)X(I-Px2) uppose that The h2=(z(I-P1)Z)z(I-P1) But (I-B)Z=2-1(11)1Z 11 Thus (I-B1)Z=2 221-21 22K2-K In the same way, y
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 3 Thus b2 = (X ′ 2 (I − PX1 ) X2) −1 X ′ 2 (I − PX1 ) y. In the same manner, b1 = (X ′ 1 (I − PX2 ) X1) −1 X ′ 1 (I − PX2 ) y. Suppose that X1 = 1 . . . 1 and X2 = Z(n×K2) . Then b2 = (Z ′ (I − P1) Z) −1 Z ′ (I − P1) y. But (I − P1)Z = Z − 1 (1 ′1) 1 ′Z and 1 ′1 =n 1 ′Z = 1 · · · 1 z11 · · · z1K2 . . . zn1 · · · znK2 = n i=1 zi1 · · · n i=1 ziK2 . Thus, (I − P1) Z = Z − 1 . . . 1 z¯1 · · · z¯K2 = z11 − z¯1 · · · z1K2 − z¯K2 z21 − z¯1 · · · z2K2 − z¯K2 . . . zn1 − z¯1 · · · znK2 − z¯K2 In the same way, (I − P1) y = y1 − y¯ . . . yn − y¯
By Xb ER 3 LEXn. nQUXREn ME. y OI n POR En IMX. INZ C These show that R is equivalent to the Ols estimator of C in the demeaned regression quation H-H=C(a-2)+NT Whet her we demean the data and run regression or put a constant term in the model and run regression, we get the same results
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 4 These show that b2 is equivalent to the OLS estimator of β in the demeaned regression equation yi − y¯ = β ′ (zi − z¯) + εi . z¯ = (¯z1, · · · , z¯K2 ) ′ Whether we demean the data and run regression or put a constant term in the model and run regression, we get the same results