CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION Chapter 10 Generalized Least Squares Estimation 10. 1 Model y=XB+E EX]=0 Ee|x]=a9=2(92>0) 1. Heteroskedasticity 0 2. Autocorrelation 1 11 Pn- 0.2 OLS and Iv estimation ● OLS estimation The Ols estimator can be written as b=B+(X'X 1. Unbiasedness E=Ex[E团X]=B 2. Variance-Coviance Matrix Var[IX]=E[(b-B)(b-B) EI(XX (X'X)X(o2Q)X(XX)- The unconditional variance is Ex Var例X If e is normally distributed bIXNN(B,02(X'X)XSX(X'X)
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 1 Chapter 10 Generalized Least Squares Estimation 10.1 Model y = Xβ + ε E [ε|X] = 0 E [εε′ |X] = σ 2Ω = Σ (Ω > 0). 1. Heteroskedasticity σ 2Ω = σ 2 w11 0 ∼ w22 . . . 0 ∼ wnn = σ 2 1 0 ∼ . . . . . . 0 ∼ σ 2 n 2. Autocorrelation σ 2Ω = σ 2 1 ρ1 · · · ρn−1 β1 1 · · · ρn−2 . . . . . . ρn−1 · · · · · · 1 10.2 OLS and IV estimation • OLS estimation The OLS estimator can be written as b = β + (X ′X) −1 X ′ ε. 1. Unbiasedness E [b] = EX [E [b|X]] = β. 2. Variance—Coviance Matrix V ar [b|X] = E (b − β) (b − β) ′ |X = E (X ′X) −1 X ′ εε′X (X ′X) −1 |X = (X ′X) −1 X ′ σ 2Ω X (X ′X) −1 . The unconditional variance is EX [V ar [b|X]] . If ε is normally distributed, b|X ∼ N β, σ2 (X ′X) −1 X ′ΩX (X ′X) −1
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 3. Consistency Suppose that XX Q>0 X'QX PP>0 The bIX 1/xx)- 2X'QX/X'X P 0 →0. Using this and Chebyshev's inequality, we have for and aE) and e>0 P[a(b-)>≤ aE(b-B)(b-B)a d'Var(b)a 0asn→a which implies B 4. Asymptotic distribution of b Assume(Xi, Ei) is a sequence of independent observations with E(e)=ding(2,…,2) In addition, assume for any AE Rk and 8>0 E|XXe2+≤ B for all i Then, we can apply the CLT for a sequence of independent random variables with gives ∑XX;d 0,1in∑E(XXx)
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 2 3. Consistency Suppose that X′X n P→ Q > 0 X′ΩX n P→ P > 0. Then V ar [b|X] = 1 n X′X n −1 σ 2X′ΩX n X′X n −1 P→ 0 and V ar [b] P→ 0. Using this and Chebyshev’s inequality, we have for and α ∈ Rk−{0} and ε > 0 P [|α ′ (b − β)| > ε] ≤ α ′E (b − β) (b − β) ′ α ε 2 = α ′V ar (b) α ε 2 → 0 as n → ∞ which implies b p→ β. 4. Asymptotic distribution of b Assume (Xi , εi) is a sequence of independent observations with E (εε′ ) = diag σ 2 1 , · · · , σ2 n = Σ In addition, assume for any λ ∈ Rk and δ > 0 E |λ ′Xiεi | 2+δ ≤ B for all i. Then, we can apply the CLT for a sequence of independent random variables, with gives λ ′Xiεi √ n d→ N 0, limn→∞ 1 n ∞ i=1 E λ ′Xiε 2 i X ′ iλ
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION But ∑E(Nx ∑EE(XX=XX) OXE(X:XA 1 ∑E(XX入 -X02X,X/ A X(X∑X)A Thu ∑X P=plin-X∑X, When Ei are serially correlated, we need a different set of conditions and CLT to derive the asymptotic normality result. See White's "Asymptotic Theory for Ecor ● IV estimation Qzz(>0 Qzx(≠ X ZoX (Zi, ei 'is a sequence of independent random vectors with E(EEx)=diag(o?, .. 02 ∑.E|Nz=+°≤ B for all i for any A∈ Re and6>0.Then, letting Qzx)QxzQz2
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 3 But 1 n E λ ′Xiε 2 i X ′ iλ = 1 n EE λ ′Xiε 2 i X ′ iλ|X = 1 n σ 2 i λ ′E (XiX ′ i ) λ = 1 n λ ′σ 2 i E (XiX ′ i ) λ = plim1 n λ ′σ 2 i XiX ′ iλ = plim1 n λ ′ (X ′ΣX) λ. Thus Xiεi √ n d→ N (0, P), where P = plim1 n X ′ΣX, and we obtain √ n (b − β) d→ N 0, Q−1P Q−1 . When εi are serially correlated, we need a different set of conditions and CLT to derive the asymptotic normality result. See White’s “Asymptotic Theory for Econometricians” for this. • IV estimation Assume Z ′Z n p→ QZZ (> 0) Z ′X n p→ QZX ( = 0) X′X n p→ QXX (> 0) Z ′ΩX n p→ QZΣX (Z ′ i , εi) ′ is a sequence of independent random vectors with E (εε′ |X) = diag (σ 2 1 , · · · , σ2 n ) = Σ. E |λ ′Ziεi | 2+δ ≤ B for all i for any λ ∈ Rk and δ > 0. Then, letting QXXZ = QXZQ −1 ZZQZX−1 QXZQ −1 ZZ
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION n(brv-B)=N(0, QxxzQz2zQxx When Ei are serially correlated, as before, we need more assumptions and a different CLT 10.3 Robust estimation of asymptotic covariance matrices We can still use Ols for inference if its variance-covariance matrix (XX)-x∑X(XX)- can be estimated Suppose that Obviously, of,..., 02 cannot be estimated. But what we need is to estimate X'>X not ∑. We may write 1 ∑xX This ∑e2Xx have the same probability limit by the LLN. We replace a? with e, and, then, hav XX 1 -x{(3 BXX EiXiX+op (1) (See White(1980, Econometrica) for details Thus CeXi Xi consistently estimate X'EX, and the estimated asymptotic variance- covariance matrix b is XX ∑eX XX
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 4 √ n (bIV − β) d→ N (0, QXXZQZΣZQXXZ). When εi are serially correlated, as before, we need more assumptions and a different CLT. 10.3 Robust estimation of asymptotic covariance matrices We can still use OLS for inference if its variance—covariance matrix (X ′X) −1 X ′ΣX (X ′X) −1 can be estimated. Suppose that Σ = diag σ 2 1 , · · · , σ2 n . Obviously, σ 2 1 , · · · , σ2 n cannot be estimated. But what we need is to estimate X′ΣX not Σ. We may write 1 n X ′ΣX = 1 n σ 2 i XiX ′ i . This 1 n n i=1 ε 2 i XiX ′ i have the same probability limit by the LLN. We replace ε 2 i with e 2 i and, then, have 1 n e 2 i XiX ′ i = 1 n εi − X ′ i βˆ − β 2 XiX ′ i = 1 n ε 2 i XiX ′ i + op (1). (See White (1980, Econometrica) for details) Thus 1 n e 2 i XiX′ i consistently estimate 1 nX′ΣX, and the estimated asymptotic variance— covariance matrix b is 1 n X ′X −1 1 n e 2 i XiX ′ i 1 n X ′X −1 p→ Q −1P Q−1
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION We can use this result for hypothesis testing. Suppose that the null hypothesis is Ho RB=r. Then, Wald test is defined by W=(Rb-r)R(X'X) 2ex; X: (XX)R(Rb-r) (heteroskedasticity robust Wald test and as r→o wx(), J=rank(R) This follows because √(Bb-r)|P/xr)- ∑e XX 2N(0,D)N(0,) (刀) If the null hypothesis is Ho: Bk= Bi, use the t-ratio bk- Bk V=(Xx)∑2xx(Xx) (0,1) This is Whites heteroskedasticity robust t-ratio 10. 4 GLS Since o>0. it can be factored as Q= CAO where the columns of C are the characteristic vectors of Q and the characteristic roots of Q2 are put in the diagonal matrix A
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 5 We can use this result for hypothesis testing. Suppose that the null hypothesis is H0 : Rβ = r. Then, Wald test is defined by W = (Rb − r) ′ R (X ′X) −1e 2 i XiX ′ i (X ′X) −1 R ′ (Rb − r) (heteroskedasticity robust Wald test) and as n → ∞ W d→ χ 2 (J), J = rank (R). This follows because W = √ n (Rb − r) ′ R X′X n −1 1 n e 2 i XiX ′ i X′X n −1 R ′ −1 √ n (Rb − r) d→ N (0, IJ ) ′ N (0, IJ ) = χ 2 (J). If the null hypothesis is H0 : βk = β 0 k , use the t − ratio t = bk − β 0 √ k Vkk where V = (X ′X) −1e 2 i XiX ′ i (X ′X) −1 . As t → ∞ t d→ N (0, 1). This is White’s heteroskedasticity robust t − ratio. 10.4 GLS Since Ω > 0, it can be factored as Ω = CΛC ′ where the columns of C are the characteristic vectors of Ω and the characteristic roots of Ω are put in the diagonal matrix Λ
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION Let p= ca-1/tHen CA-C=CAAAC Since C-. Premultiplying the linear regression model by P, we obtain P XB+ W&=XB+ Hence E(∈,)=PE(e)P=2PP g2A-1/CCAC'CA-I/2 The transformed model satisfies the conditions of the classical linear regression model ence BGls=(X:x, (X'Ppx-X'PPI This estimator is called the generalized least squares estimator. The properties of the GLs estimator If EE.X=0, E 2.If XIX. =Q(>0), BGLS B 3.√m The GLS estimator BGLs is the BLUE E Ut=BXt id(0, a), lpl
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 6 Let P ′ = CΛ −1/2 . Then Ω −1 = C ′−1Λ −1C −1 = CΛ −1C ′ = CΛ −1/2Λ −1/2C ′ = P ′P since C ′ = C −1 . Premultiplying the linear regression model by P, we obtain P y = P Xβ + P ε or y∗ = X∗β + ε∗. Hence E (ε∗ε ′ ∗ ) = P E (εε′ ) P ′ = σ 2PΩP ′ = σ 2Λ −1/2C ′CΛC ′CΛ −1/2 = σ 2 I. The transformed model satisfies the conditions of the classical linear regression model. Hence βˆ GLS = (X ′ ∗X∗) −1 X ′ ∗ y∗ = (X ′P ′P X) −1 X ′P ′P y = X ′Ω −1X −1 X ′Ω −1 y. This estimator is called the generalized least squares estimator. The properties of the GLS estimator 1. If E [ε∗|X∗] = 0, E βˆ GLS = β. 2. If 1 nX′ ∗X∗ p→ Q∗ (> 0), βˆ GLS p→ β. 3. √ n βˆ GLS − β d→ N (0, σ2Q−1 ∗ ). The GLS estimator βˆ GLS is the BLUE. Example 1 yt = β ′Xt + εt εt = ρεt−1 + ut , ut ∼ iid 0, σ2 , |ρ| < 1
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION P EeEh 0 p 1+p-p 1 The transformation matric is 0 The transformed model is 1-p2y=√1-p2X1B+e1 where Ei=V1-p2E1Since 1. EE= Eut=0 2.Var(1)=(1-p2)Var(=1)=(1-p2) 3.E(e1u)=√1-p2E(e1)=0,t=2,…,n the error terms of the transformed satisfy the condition of the standard linear regres- Sion mo The GLS estimator BGls depends on the unknown parameters associated with Q2 and, therefore, cannot be used in practice. Suppose that 22,S2. Then the feasible GLS estimator is defined by XQ-X XQ If -XQ-Ix--XQ-Ix P
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 7 Eεε′ = σ 2 1 − ρ 2 1 ρ ρ2 · · · ρ T −1 ρ 1 · · · ρ T −2 . . . . . . . . . . . . ρ T −1 1 = σ 2Ω. Ω −1 = 1 −ρ 0 ∼ −ρ 1 + ρ 2 −ρ . . . . . . . . . −ρ 1 + ρ 2 −ρ 0 ∼ −ρ 1 The transformation matrix is P = 1 − ρ 2 0 ∼ −ρ 1 . . . . . . 0 ∼ −ρ 1 The transformed model is 1 − ρ 2y1 = 1 − ρ 2X ′ 1β + ε ∗ 1 yt − ρyt−1 = (Xt − ρXt−1) ′ β + ut where ε ∗ 1 = 1 − ρ 2ε1. Since 1. Eε∗ i = Eut = 0 2. V ar (ε ∗ 1 ) = (1 − ρ 2 ) V ar (ε1) = (1 − ρ 2 ) · σ 2 1−ρ 2 = σ 2 3. E (ε ∗ 1ut) = 1 − ρ 2E (ε1ut) = 0, t = 2, · · · , n the error terms of the transformed satisfy the condition of the standard linear regression model. The GLS estimator βˆ GLS depends on the unknown parameters associated with Ω and, therefore, cannot be used in practice. Suppose that Ωˆ p→ Ω. Then the feasible GLS estimator is defined by βˆ FG = X ′Ωˆ −1X −1 X ′Ωˆ −1 y. If 1 n X ′Ω −1X − 1 n X ′Ωˆ −1X p→ 0
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 1 XQ BG and BEg have the same asymptotic distribution Example 2 AR(1)error Et= pEt-1+ut, ut wiid(0,0), lel <1 1. Run OLS and get Et 2. Run AR(1) regression using Et. This gives p 3. Transform the model using p and run OLS 10.5 Equivalence of GLS and Ols Let XX and 2 be both positive definite. Then the following statements are equivalent (A)(Xx)-1x∑x(xx)-1=(Xx-1x)-1 (B)EX=XB for some nonsingular B (C)(XX)-x=(X∑-x)-x∑ Example 3 B1+B2t+Er Et= pEt-1+ut, ut w iid(0, o2),lpl<1 1 Then ∑X≈XA( This is an erercise problem) for some nonsingular A. Thus, OLS and GLS for this model are asymptotically equivalent
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 8 and 1 n X ′Ω −1 ε − 1 n X ′Ωˆ −1 ε p→ 0, βˆ G and βˆ F G have the same asymptotic distribution. Example 2 AR(1) error yt = β ′Xt + εt εt = ρεt−1 + ut , ut ∼ iid 0, σ2 , |ρ| < 1. 1. Run OLS and get ˆεt . 2. Run AR (1) regression using ˆεt . This gives ρ. ˆ 3. Transform the model using ρˆ and run OLS. 10.5 Equivalence of GLS and OLS Let X′X and Σ be both positive definite. Then the following statements are equivalent (A) (X′X) −1 X′ΣX (X′X) −1 = (X′Σ −1X) −1 . (B) ΣX = XB for some nonsingular B. (C) (X′X) −1 X′ = (X′Σ −1X) −1 X′Σ −1 . Example 3 yt = β1 + β2 t + εt εt = ρεt−1 + ut , ut ∼ iid 0, σ2 , |ρ| < 1. Σ = σ 2 1 − ρ 2 1 ρ · · · ρ n−1 ρ 1 · · · ρ n−2 . . . ρ n−1 1 Then ΣX ≃ XA (This is an exercise problem) for some nonsingular A. Thus, OLS and GLS for this model are asymptotically equivalent