CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING B Chapter 3 Least Squares Methods for Estimating B Methods for estimat ing B Least squares estimation Maximum like lihood estimation Met hod of moments est imation Least a bsolute deviat ion est imation 3.1 Least squares estimation The criterion of the least squares estimation is (y-Xbo)(y-Xbo Let the ob jective funct ion be S(bo)=(y-Xbo(y-Xbo)=yy-boX'y-yXbo+60X'Xbo a/y-2/Xbo+ boX Xbo The first-order condition for the minimization of this funct ion is aS(bo) 2xy+2XXb0=0 The solution of this equat ion is the least squares estimate of the coefficient vector B b=(XXX If rank(X)=K, rank(X'x)=K. Thus, the inverse of X'X exists ete=y-Xb. We call this residual vect or. We have y Xb y-X(XXXy I-XX (I-P)y, where P=X(XXX. The matrix P is called projection matrix. We also let I-P M. Then, we may write(2)as b+e=Py+Mg
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 1 Chapter 3 Least Squares Methods for Estimating β Methods for estimating β Least squares estimation Maximum likelihood estimation Method of moments estimation Least absolute deviation estimation . . . 3.1 Least squares estimation The criterion of the least squares estimation is min b0 n i=1 (yi − X ′ i b0) 2 or min b0 (y − Xb0) ′ (y − Xb0). Let the objective function be S (b0) = (y − Xb0) ′ (y − Xb0) = y ′ y − b ′ 0X ′ y − y ′Xb0 + b ′ 0X ′Xb0 = y ′ y − 2y ′Xb0 + b0X ′Xb0. The first—order condition for the minimization of this function is ∂S (b0) ∂b0 = −2X ′ y + 2X ′Xb0 = 0. The solution of this equation is the least squares estimate of the coefficient vector β. b = (X ′X) −1 X ′ y. If rank (X) = K, rank (X′X) = K. Thus, the inverse of X′X exists. Let e = y − Xb. We call this residual vector. We have e = y − Xb (1) = y − X(X ′X) −1X ′ y = (I − X (X ′X) −1 X ′ )y = (I − P)y, (2) where P = X (X′X) −1 X′ . The matrix P is called projection matrix. We also let I −P = M. Then, we may write (2) as y = Xb + e = P y + My
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING B We often write Py=y. This is the part of y that is explained by X Properties of the matrices P and M ar (i)P= P, P=P(idempotent matrix ()M=M,M=M (iii PX=X, MX=0 (iv)PM=0 Xe=XMy If the first column of X is 1=(1,..., 1), this relation implies In addition,(iv) gives iy=yP'Py+yMMy=yy+e 3.2 Partitioned regression and partial regression Consider 3= XB+E=X161+ X2B2+E The normal equations for b1 and b2 are XIX1 X1X2b1 X1X2X2八(b2 X The first part of t hese equations are (XIX1b+(X,X2)b2= Xiy which give b1 XiX1XiX2b2 (X1X1)-X1(-X2b2) Plug this into the second part of the normal equations. Then, ha X, X161+X2 X2 b2 X2X1(X1X1)-X1y-X2X1(X1X1)-X1X2b2+2X2b2 X2X1(X1X1)Xly+X2(I-Px)X2b2
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 2 We often write Py = y. ˆ This is the part of y that is explained by X. Properties of the matrices P and M are: (i) P ′ = P, P 2 = P (idempotent matrix) (ii) M′ = M, M2 = M (iii) PX = X, MX = 0 (iv) PM = 0 Using (1) and (iii), we have X ′ e = X ′My = 0. If the first column of X is 1 = (1, · · · , 1)′ , this relation implies X ′ 1 e = n i=1 ei = 0. In addition, (iv) gives y ′ y = y ′P ′P y + y ′M′My = ˆy ′ yˆ + e ′ e 3.2 Partitioned regression and partial regression Consider y = Xβ + ε = X1β1 + X2β2 + ε. The normal equations for b1 and b2 are X′ 1X1 X′ 1X2 X′ 2X1 X′ 2X2 b1 b2 = X′ 1 y X′ 2 y . The first part of these equations are (X ′ 1X1) b1 + (X ′ 1X2) b2 = X ′ 1y which gives b1 = (X ′ 1X1) −1 X ′ 1y − (X ′ 1X1) −1 X ′ 1X2b2 = (X ′ 1X1) −1 X ′ 1 (y − X2b2). Plug this into the second part of the normal equations. Then, we have X ′ 2X1b1 + X ′ 2X2b2 = X ′ 2X1 (X ′ 1X1) −1 X ′ 1y − X ′ 2X1 (X ′ 1X1) −1 X ′ 1X2b2 + X ′ 2X2b2 = X ′ 2X1 (X ′ 1X1) −1 X ′ 1 y + X ′ 2 (I − PX1 ) X2b2 = X ′ 2 y.
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING 1 Thr a2=(e 2(xMlxe2)e2(xM)Is 1,ShX. bk Xh b,,龙 a1=(ei( Mlx)e1)ei(xMI x,)I SrppkXshb S e ThX (d(x)d)d(xMl1L Br s (xM1)d=dM1(11)1d b, d 11 f11 eee fuka i'd 1 eee 2=121 eee 7 1K2)s (x MI1)d d m fp 1 eee IK? f1 Mf1 f1r MfK f21 Mf1 eee fnl MfI eee fnk, MfK I, shX.bk Xwby L ME (x MIIL Ln Ml
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 3 Thus b2 = (X ′ 2 (I − PX1 ) X2) −1 X ′ 2 (I − PX1 ) y. In the same manner, b1 = (X ′ 1 (I − PX2 ) X1) −1 X ′ 1 (I − PX2 ) y. Suppose that X1 = 1 . . . 1 and X2 = Z(n×K2) . Then b2 = (Z ′ (I − P1) Z) −1 Z ′ (I − P1) y. But (I − P1)Z = Z − 1 (1 ′1) 1 ′Z and 1 ′1 =n 1 ′Z = 1 · · · 1 z11 · · · z1K2 . . . zn1 · · · znK2 = n i=1 zi1 · · · n i=1 ziK2 . Thus, (I − P1)Z = Z − 1 . . . 1 z¯1 · · · z¯K2 = z11 − z¯1 · · · z1K2 − z¯K2 z21 − z¯1 · · · z2K2 − z¯K2 . . . zn1 − z¯1 · · · znK2 − z¯K2 In the same way, (I − P1) y = y1 − y¯ . . . yn − y¯
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING B These show that b2 is equivalent to the OLS estimator of B in the demeaned regression B(-2) Whet her we demean the data and run regression or put a constant term in the model and run regression, we get the same results 3. 3 Goodness-of-fit measures y M=I-1(11)-1with1= Mo transforms observation into deviations from sample means. Then M Xb+M M X6+M 1 The total sum of variation of v is bXMOX6 ∑(v-列)2∑(-2∑e Note that bXMe bx(r-1(11)-1 bXM=-bX1(11)-1M because XM=0and 1'M=0.XMb is called regression sum of squares(SSR) and e'e error sum of squares(SSE)
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 4 These show that b2 is equivalent to the OLS estimator of β in the demeaned regression equation yi − y¯ = β ′ (zi − z¯) + εi . z¯ = (¯z1, · · · , z¯K2 ) ′ Whether we demean the data and run regression or put a constant term in the model and run regression, we get the same results. 3.3 Goodness—of—fit measures (i) R2 Write y = Xb + e = ˆy + ˆe. Let M0 = I − 1 (1 ′1) −1 1 ′ with 1 = 1 . . . 1 . M0 transforms observation into deviations from sample means. Then M0 y = M0Xb + M0 e = M0Xb + M0 e or y − 1y¯ = ˆy − 1y¯ + e. The total sum of variation of yi is y ′M0y = b ′X′M0Xb + e ′ e. (yi − y¯) 2 (ˆyi − y¯) 2 e 2 i Note that b ′X ′M0 e = b ′X ′M0Mε = b ′X ′ I − 1 (1 ′1) −1 1 ′ Mε = b ′X ′Mε − b ′X ′1 (1 ′1) −1 1 ′Mε = 0 because X′M = 0 and 1 ′M = 0. b′X′M0 b is called regression sum of squares (SSR), and e ′ e error sum of squares (SSE)
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING B How well the regression line fits the dat a can be explained by P2 SSR bXMOXb SST 3Mo We call r coefficient of det ermination Remark 1 1: perfect fit Remark 2 s X: R for the regression of y on X and an additional wariable Z Rx: R for the regression of y on X Then RX=RX+(1-R2 where 02(22x+)(3y) R2 increases as the number of regressors increa ses whatever quality the additional regres soTs naue (ii)Theil's R2(ac R2=1 e'e/(n-k) y/M°y/(n-1) R2 will fall (rise) when the varia ble a is deleted from the regression if the t-ratio associated with this variable is greater(less) than 1 (iii) Information criteria AIC() +-( Akaike's information criteria. BIC(k)=In-+ k lnn (Bayesian information criteria) The sma ller, the better
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 5 How well the regression line fits the data can be explained by R 2 = SSR SST = b ′XM0Xb y ′M0y = 1 − e ′ e y ′M0y . We call R2 coefficient of determination. Remark 1 0 ≤ R 2 ≤ 1 0 : no fit 1 : perfect fit Remark 2 R2 Xz : R2 for the regression of y on X and an additional variable Z. R2 X : R2 for the regression of y on X. Then R 2 Xz = R 2 X + 1 − R 2 X r ∗2 yz where r ∗2 yz = (z ′ ∗ y∗) 2 (z ′ ∗ z∗) (y ′ ∗ y∗) , z∗ = (I − PX) z, y∗ = (I − PX) y. R2 increases as the number of regressors increases whatever quality the additional regressors have. (ii) Theil’s R¯2 (adjusted R2 ) R¯2 = 1 − e ′ e/ (n − k) y ′M0y/ (n − 1) = 1 − n − 1 n − K 1 − R 2 R¯2 will fall (rise) when the variable x is deleted from the regression if the t—ratio associated with this variable is greater (less) than 1. (iii) Information criteria AIC (k) = ln e ′ e n + 2k n (Akaike’s information criteria) BIC (k) = ln e ′ e n + k ln n n (Bayesian information criteria) P C (k) = e ′ e n − k 1 + k n The smaller, the better.