Ch.8 Nonspherical Disturbance This chapter will assume that the full ideal conditions hold except that the covari- ance matrix of the disturbance, i.e. E(EE)=02Q2, where Q is not the identity matrix. In particular, Q may be nondiagonal and / or have unequal diagonal ele- ments Two cases we shall consider in details are heteroscedasticity and auto- correlation. Disturbance are heteroscedastic when they have different variance Heteroscedasticity usually arise in cross-section data where the scale of the de- pendent variable and the explanatory power of the model tend to vary across observations. The disturbance are still assumed to be uncorrelated across obser vation, so an would be g2 0 2g2 Autocorrelation is usually found in time-series data. Economic time-series often display a"memory"in that variation around the regression function is not independent from one period to the next. Time series data are usually ho- moscedasticity, so aQ2 would be 1p1 PT-1 Pr-2 In recent studies, panel data sets, constituting of cross sections observed at several points in time, have exhibited both characteristics. The next three chap- ter examines in details specific types of generalized regression models Our earlier results for the classical mode: will have to be modified. We first consider the consequence of the more general model for the least squares estima- tors
Ch. 8 Nonspherical Disturbance This chapter will assume that the full ideal conditions hold except that the covariance matrix of the disturbance , i.e. E(εε0 ) = σ 2Ω, where Ω is not the identity matrix. In particular, Ω may be nondiagonal and/or have unequal diagonal elements. Two cases we shall consider in details are heteroscedasticity and autocorrelation. Disturbance are heteroscedastic when they have different variance. Heteroscedasticity usually arise in cross-section data where the scale of the dependent variable and the explanatory power of the model tend to vary across observations. The disturbance are still assumed to be uncorrelated across observation, so σ 2Ω would be σ 2Ω = σ 2 1 0 . . . 0 0 σ 2 2 . . . 0 . . . . . . . . . . . . . . . . . . 0 0 . . . σ 2 T . Autocorrelation is usually found in time-series data. Economic time-series often display a ”memory” in that variation around the regression function is not independent from one period to the next. Time series data are usually homoscedasticity, so σ 2Ω would be σ 2Ω = σ 2 1 ρ1 . . . ρT −1 ρ1 1 . . . ρT −2 . . . . . . . . . . . . . . . . . . ρT −1 ρT −2 . . . 1 . In recent studies, panel data sets, constituting of cross sections observed at several points in time, have exhibited both characteristics. The next three chapter examines in details specific types of generalized regression models. Our earlier results for the classical mode; will have to be modified. We first consider the consequence of the more general model for the least squares estimators. 1
1 Properties of the Least squares Estimators Theorem The OLS estimator B is unbiased. Furthermore, if limT-oo(X'Q2X/T)is finite, B Proof: E(B)=B+E(X'X)-IX'E=B, which proves unbiasedness Also plim B=B+ lim plim- T→∞0 X' has zero mean and covariance matrix If limT-oo(X'QLX/) is finite, then xnX=0. Hence x= has zero mean and its covariance matrix vanishes asymptotically, which implies plim AT=0, and therefore, plim 6=6 Theorem The covariance matrix of B is o(X'X)XnX(XX) Proof: E(B-B)(-B)=E(XX)1x∈∈X(XX)- o(XX-XQX(XX fote that the covariance matrix of B is no longer equal to 2(X'X)-.It may be either"larger"or"smaller", in the sense that(X'X)IX'QX(XX-1 (XX)-can be either positive semidefinite, negative semidefinite, or neither Theorem e'e/(T-k) is(in general)a biased and inconsistent estimator of
1 Properties of the Least Squares Estimators Theorem: The OLS estimator βˆ is unbiased. Furthermore, if limT→∞(X0ΩX/T) is finite, βˆ is consistent. Proof: E(βˆ) = β + E(X0X) −1X0ε = β, which proves unbiasedness. Also plim βˆ = β + lim T→∞ X0X T −1 plim X0ε T . But X0ε T has zero mean and covariance matrix σ 2X0ΩX T 2 . If limT→∞(X0ΩX/T) is finite, then σ 2 T X0ΩX T = 0. Hence X0ε T has zero mean and its covariance matrix vanishes asymptotically, which implies plim X0ε T = 0, and therefore, plim βˆ = β. Theorem: The covariance matrix of βˆ is σ 2 (X0X) −1X0ΩX(X0X) −1 . Proof: E(βˆ − β)(βˆ − β) 0 = E(X0X) −1X0 εε0X(X0X) −1 = σ 2 (X0X) −1X0ΩX(X0X) −1 . Note that the covariance matrix of βˆ is no longer equal to σ 2 (X0X) −1 . It may be either ”larger” or ”smaller”, in the sense that (X0X) −1X0ΩX(X0X) −1 − (X0X) −1 can be either positive semidefinite, negative semidefinite, or neither. Theorem: s 2 = e 0e/(T − k) is (in general) a biased and inconsistent estimator of σ 2 . 2
e(ee)= e(Em trace E(Me trace Mn ≠a2(T-k) Also since E(s2)fo2, it is hard to see that it is a consistent estimator of o2 from convergence in mean square error 2 Efficient Estimators To begin, it is useful to consider cases in which Q2 is a known, symmetric, positive definite matrix. This assumption will occasionally be true, but in most models, Q will contains unknown parameters that must also be estimated ssume that a2 0 therefore. we have a” known”9 2.1 Generalized Least Square(GLS) Estimators Since @2 is a positive symmetric matrix, it can be factored into CA-IC=CA-1/2A-1/2C/=Pp where the column of C are the eigenvectors of Q2 and the eigenvalues of Q2 are d in the diagonal matrix a and p= ca
Proof: E(e 0 e) = E(ε 0Mε) = trace E(Mεε0 ) = σ 2 trace MΩ 6= σ 2 (T − k). Also since E(s 2 ) 6= σ 2 , it is hard to see that it is a consistent estimator of σ 2 from convergence in mean square error. 2 Efficient Estimators To begin, it is useful to consider cases in which Ω is a known, symmetric, positive definite matrix. This assumption will occasionally be true, but in most models, Ω will contains unknown parameters that must also be estimated. Example: Assume that σ 2 t = σ 2x2t , then σ 2Ω = σ 2x21 0 . . . 0 0 σ 2x22 . . . 0 . . . . . . . . . . . . . . . . . . 0 0 . . . σ 2x2T = σ 2 x21 0 . . . 0 0 x22 . . . 0 . . . . . . . . . . . . . . . . . . 0 0 . . . x2T , therefore, we have a ”known” Ω. 2.1 Generalized Least Square (GLS) Estimators Since Ω is a positive symmetric matrix, it can be factored into Ω −1 = CΛ−1C 0 = CΛ−1/2Λ −1/2C 0 = P 0P, , where the column of C are the eigenvectors of Ω and the eigenvalues of Ω are arrayed in the diagonal matrix Λ and P0 = CΛ−1/2 . 3
Theorem Suppose that the regression model Y=XB+E satisfy the ideal conditions except that @2 is not the identity matrix. Suppose that X9-1x lin is finite and nonsingular. Then the transformed equation PY=PXB+PE satisfies the full ideal condition Proof Since p is nonsingular and nonstochastic. PXis nonstochastic and of full rank if X is.(Condition 2 and 5). Also, for the consistency of OLS estimators -o T= lim X'n-1X lim (PX(PX T is finite and nonsingular by assumption. Therefore the transformed regressors ma- trix satisfies the required conditions, and we need consider only the transformed disturbance pe Clearly, E(PE)=0(Condition 3). Also E(PE)(PE) a2(A1/2C)(CAC)CA-12) o I(Condition 4) Finally, the normality(Condition 6) of Pe follows immediately from the nor- y of e Theorem The blue of B is just B=(X!2-1x)-1xg-1Y Proof: Since the transformed equation satisfies the full ideal conditions, the blue of 6
Theorem: Suppose that the regression model Y = Xβ+ε satisfy the ideal conditions except that Ω is not the identity matrix. Suppose that lim T→∞ X0Ω−1X T is finite and nonsingular. Then the transformed equation PY = PXβ + Pε satisfies the full ideal condition. Proof: Since P is nonsingular and nonstochastic, PXis nonstochastic and of full rank if X is. (Condition 2 and 5). Also, for the consistency of OLS estimators lim T→∞ (PX) 0 (PX) T = lim T→∞ X0Ω−1X T is finite and nonsingular by assumption. Therefore the transformed regressors matrix satisfies the required conditions, and we need consider only the transformed disturbance Pε. Clearly, E(Pε) = 0 (Condition 3). Also E(Pε)(Pε) 0 = σ 2PΩP0 = σ 2 (Λ −1/2C 0 )(CΛC0 )(CΛ−1/2 ) = σ 2Λ −1/2ΛΛ−1/2 = σ 2 I (Condition 4). Finally, the normality (Condition 6) of Pε follows immediately from the normality of ε. Theorem: The BLUE of β is just β˜ = (X0Ω −1X) −1X0Ω −1Y. Proof: Since the transformed equation satisfies the full ideal conditions, the BLUE of β 4
Is Just B=[(PX)(PX)-(PX)(PY) (X)X'Q2-Y Indeed, since B is the OLs estimator of B in the transformed equation, and since the transformed equation satisfies the ideal conditions, B has all the usual de- sirable properties-it is unbiased, BLUE, efficient, consistent, and asymptotically B is the Ols of the transformed equation, but it is a generalized least square (GLS) estimator of the original regression model which take the Ols as a sub- cases when Q=I Theorem The variance-covariance of the GLS estimator B is o(X'Q2-IX) TOO Viewing B as the Ols estimator in the transformed equation, it is clearly has covariance matrix o2(PX)(PX)]-1=a2(X92-1x)-1 Theorem An unbiased, consistent, efficient, and asymptotically efficient estimator of o wheree=Y-XB Proof: Since the transformed equation satisfies the ideal conditions, the desired estimator K(PY-PXB)(PY-PXB=m 7=k(Y-x/9(Y-xB)
is just β˜ = [(PX) 0 (PX)]−1 (PX) 0 (PY) = (X0Ω −1X) −1X0Ω −1Y. Indeed, since β˜ is the OLS estimator of β in the transformed equation, and since the transformed equation satisfies the ideal conditions, β˜ has all the usual desirable properties–it is unbiased, BLUE, efficient, consistent, and asymptotically efficient. β˜ is the OLS of the transformed equation, but it is a generalized least square (GLS) estimator of the original regression model which take the OLS as a subcases when Ω = I. Theorem: The variance -covariance of the GLS estimator β˜ is σ 2 (X0Ω−1X) −1 . Proof: Viewing β˜ as the OLS estimator in the transformed equation, it is clearly has covariance matrix σ 2 [(PX) 0 (PX)]−1 = σ 2 (X0Ω −1X) −1 . Theorem: An unbiased, consistent, efficient, and asymptotically efficient estimator of σ 2 is s˜ 2 = ˜e 0Ω−1˜e T − k , where ˜e = Y − Xβ˜. Proof: Since the transformed equation satisfies the ideal conditions, the desired estimator of σ 2 is 1 T − k (PY − PXβ˜) 0 (PY − PXβ˜) = 1 T − k (Y − Xβ˜) 0Ω −1 (Y − Xβ˜). 5
Finally, for testing hypothesis we can apply the full set of results in Chapter 6 to the transformed equation. For the testing the m restrictions RB g, the appropriate(one of) statistics is (RB-9[3R(PX(PX)-RT-(RB-q (RB-95R(XQ2-X)RT(RB-q Derive the other three test statistics (in Chapter 6) of the F-Ratio test statistics to test the hypothesis RB=q when 2+I 2.2 Maximum likelihood estimators Assume that e nN(0,o 0), if X are not stochastic, then by results from"func- tions of random variables"(n>n transformation)we have Y N(XB, 02Q2) That is, the log-likelihood function In f(e: Y In(2)-oIno2Q21 2 XB)(a292)-1(Y-XB) T In(2m)-oln In(2 XB)Q(Y-XB where 6=(B1, B2,.,Bk, 02)since by assumption 3 is known The necessary condition for maximizing L are x-(Y-XB)=0 (Y-XG)92-(Y-X) The solution are BML=(XQX-XQY (Y-XBML)2-Y-XBML
Finally, for testing hypothesis we can apply the full set of results in Chapter 6 to the transformed equation. For the testing the m restrictions Rβ = q, the appropriate (one of) statistics is (Rβ˜ − q) 0 [s˜ 2R(PX) 0 (PX) −1R0 ] −1 (Rβ˜ − q) m = (Rβ˜ − q) 0 [s˜ 2R(X0Ω−1X) −1R0 ] −1 (Rβ˜ − q) m ∼ Fm,T −k. Exercise: Derive the other three test statistics (in Chapter 6) of the F −Ratio test statistics to test the hypothesis Rβ = q when Ω 6= I. 2.2 Maximum Likelihood Estimators Assume that ε ∼ N(0, σ 2Ω), if X are not stochastic, then by results from ”functions of random variables” (n ⇒ n transformation) we have Y ∼ N(Xβ, σ 2Ω). That is, the log-likelihood function ln f(θ; Y ) = − T 2 ln(2π) − 1 2 ln |σ 2Ω| − 1 2 (Y − Xβ) 0 (σ 2Ω)−1 (Y − Xβ) = − T 2 ln(2π) − T 2 ln σ 2 − 1 2 ln |Ω| − 1 2σ 2 (Y − Xβ) 0Ω −1 (Y − Xβ) where θ = (β1, β2, ..., βk, σ 2 ) 0 since by assumption Ω is known. The necessary condition for maximizing L are ∂L ∂β = 1 σ 2X0Ω −1 (Y − Xβ) = 0 ∂L ∂σ 2 = − T 2σ 2 + 1 2σ 4 (Y − Xβ) 0Ω −1 (Y − Xβ) = 0 The solution are βˆML = (X0Ω −1X) −1X0Ω −1Y, ˆσ 2ML = 1 T (Y − XβˆML) 0Ω −1 (Y − XβˆML), 6
which implies that with normally distributed disturbance, generalized l; east squares are also mle. as is the classical regression model the mle of gz is biased an unbiased estimator is 04-T-ECY-XBML)Q2-(Y-XBML 3 Estimation when n is Unknown If Q contains unknown parameters that must be estimated, then GLs is not feasible. But with an unrestricted Q, there are T(T+1)/2 additional parameters in o2Q2. This number is far too many to estimate with T observations. Obviousl some structures must be imposed on the model if we are to proceed 3.1 Feasible Generalized Least Squares The typical problem involves a small set parameter 0 such that Q2=S(0). For example, we may assume autocorrelated disturbance in the beginning of this apter as P1 Pr-1 1 a29=a then Q has only one additional unknown parameters, p. A model of heteroscedas- ticity that also has only one new parameters, a, is Definition If Q2 depends on a finite number of parameters, 81, 82,. Bp, and if S2 depends on consistent estimator, 01, 02,. 8,, the Q2 is called a consistent estimator of Q2 Definition: Let @2 be a consistent estimator of Q2. Then the feasible generalized least square estimator(FGLS) of B is B=(X9-1x)-x92-Y
which implies that with normally distributed disturbance, generalized l;east squares are also MLE. As is the classical regression model, the MLE of σ 2 is biased. An unbiased estimator is ˆσ 2 = 1 T − k (Y − XβˆML) 0Ω −1 (Y − XβˆML). 3 Estimation When Ω is Unknown If Ω contains unknown parameters that must be estimated, then GLS is not feasible. But with an unrestricted Ω, there are T(T + 1)/2 additional parameters in σ 2Ω. This number is far too many to estimate with T observations. Obviously, some structures must be imposed on the model if we are to proceed. 3.1 Feasible Generalized Least Squares The typical problem involves a small set parameter θ such that Ω = Ω(θ). For example, we may assume autocorrelated disturbance in the beginning of this chapter as σ 2Ω = σ 2 1 ρ1 . . . ρT −1 ρ1 1 . . . ρT −2 . . . . . . . . . . . . . . . . . . ρT −1 ρT −2 . . . 1 = σ 2 1 ρ 1 . . . ρ T −1 ρ 1 1 . . . ρ T −2 . . . . . . . . . . . . . . . . . . ρ T −1 ρ T −2 . . . 1 , then Ω has only one additional unknown parameters, ρ. A model of heteroscedasticity that also has only one new parameters, α, is σ 2 t = σ 2x α 2t . Definition: If Ω depends on a finite number of parameters, θ1, θ2, ..., θp, and if Ωˆ depends on consistent estimator, ˆθ1, ˆθ2, ..., ˆθp, the Ωˆ is called a consistent estimator of Ω. Definition: Let Ωˆ be a consistent estimator of Ω. Then the feasible generalized least square estimator (FGLS) of β is βˇ = (X0Ωˆ −1X) −1X0Ωˆ −1Y. 7
Conditions that imply that B is asymptotically equivalent to B are lim and lim XQ-1 (x)=0 Theorem An asymptotically efficient FGls does not require that we have an efficient es- timator of 8; only a consistent one is required to achieve full efficiency for the
Conditions that imply that βˇ is asymptotically equivalent to β˜ are lim T→∞ 1 T X0Ωˆ −1X − 1 T X0Ω −1X = 0 and lim T→∞ 1 √ T X0Ωˆ −1 ε − 1 √ T X0Ω −1 ε = 0. Theorem: An asymptotically efficient FGLS does not require that we have an efficient estimator of θ; only a consistent one is required to achieve full efficiency for the FGLS estimator. 8