CHAPTER 5 LARGE-SAMPLE PROPERTIES OF THE LSE Chapter 5 Large sample properties of the LsE 5.1 Stochastic convergence Suppose that Xn is a sequence of random varia bles with a corresponding sequence of distribution functions ( FnI If Fn()- F(a) at every continuity point a of F, Fn is said to converge weakly to F, written Fn= F. In this case,I Xn is said to converge in distribution to X where X is a random variable with distribution function f. written Xn X If X is a random variable. and for alle>0 lim p Xn is said to converge in proba bility to x, written Xn X. X is known as the proba bility limit of Xn, written X=plimXn Xn is said to converge in mean square to x, written Xn X Some useful results regarding stochastic convergence are 1. Xn X and g() is a continuous function Example 1 Let 0 with probability 1-i =0. Let g(a)=x+1. Then, g (Xn)g(0) 2. Suppose that Yn=Y and Xn c(a const ant).Then (a)Xn+rn-c+y () n-I when c≠0 3. Xn X and g() is cont inuous →9(Xn)+g(X) (This is called cont inuous mapping theorem
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 1 Chapter 5 Large—sample properties of the LSE 5.1 Stochastic convergence Suppose that {Xn} is a sequence of random variables with a corresponding sequence of distribution functions {Fn} . If Fn (x) → F (x) at every continuity point x of F, Fn is said to converge weakly to F, written Fn ⇒ F. In this case, {Xn} is said to converge in distribution to X where X is a random variable with distribution function F, written Xn d→ X. If X is a random variable, and for all ε > 0 limn→∞ P (|Xn − X| < ε) = 1, Xn is said to converge in probability to X, written Xn P→ X. X is known as the probability limit of Xn, written X =plimXn. If lim E (Xn − X) 2 = 0, Xn is said to converge in mean square to X, written Xn m.s. → X. Some useful results regarding stochastic convergence are: 1. Xn P→ X and g (·) is a continuous function ⇒ g (Xn) P→ g (X). Example 1 Let Xn = 1 with probability 1 n 0 with probability 1 − 1 n . Obviously, Xn P→ 0. Let g (x) = x + 1. Then, g (Xn) P→ g (0) = 1. 2. Suppose that Yn d→ Y and Xn P→ c (a constant). Then (a) Xn + Yn d→ c + Y (b) XnYn d→ cY (c) Yn Xn d→ Y c when c = 0 . 3. Xn d→ X and g (·) is continuous ⇒ g (Xn) d→ g (X). (This is called continuous mapping theorem)
CHAPTER 5 LARGE SAMPLE PROPERTIES OF THE LSE Example 2 If Xn=N(0, 1),X2x2(1) →Yn→X. 5.X= X (The converse is not necessarily true. 9c(a constant →X, →X7 If for any E>0 there exists B. o such that Xn B for all n> 1, write Xn=Op(n").(u is stochastically bounded If plimAp=0, write Xn=Op(nr) The weak law of large numbers 1. Let iXi,i>l be a sequence of i i d. r vs with EX1 1) be sequence of independent r vs with EX;=m If EXI+0) for all.Then ∑ i-m as n 2=1 Example 3 Let Ei N iid(0, a). Then →E=1=0
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 2 Example 2 If Xn d→ N (0, 1), X2 n d→ χ 2 (1). 4. Xn − Yn P→ 0 and Xn d→ X. ⇒ Yn d→ X. 5. Xn P→ X implies Xn d→ X. (The converse is not necessarily true.) 6. Xn d→ c (a constant) ⇒ Xn P→ c. 7. Xn m.s. → X ⇒ Xn P→ X. If for any ε > 0, there exists Bε Bε 0) for all i. Then 1 n n i=1 Xi P→ m as n → ∞. Example 3 Let εi ∼ iid (0, σ2 ). Then 1 n n i=1 εi P→ Eε1 = 0.
CHAPTER 5 LARGE-SAMPLE PROPERTIES OF THE LSE The central limit theorem 1. Let Xi, i>l be a sequence of i i d I vs with E(X1)=u and Var(X1)=0+0 nen (0,1) asm→o 2. Let [Xi, i>l be a sequence of independent r v s wit h mean u; and variance a-, and let a2=1∑ max10) N(0,1) Example 4 Let X; n iidB (1, p). Then EXI=p and Var (X1)=p(1-p). Thus, p(1-p) N(0,1) For vector sequences, we use the following result known as the Cramer-Wold device If [Xn) is a sequence of random vectors, Xn X iff X'Xn X'X for any vector A Example 5 Let XiN iid (0, 2). Then, m×1 ∑X N 5.2 Consistency of b assume 1.(Xi, Ei) is a sequence of independent observations 2.∑1XX(=是xxX)Q=limn-∑m1E(XX)(>0) 3. For any∈ R and s>0,E|Xei|≤B< oo for all i The least squares estimator b may be written as 8+(∑xx)(∑x Consider for入∈Rk XX
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 3 The central limit theorem 1. Let {Xi , i ≥ 1} be a sequence of i.i.d. r.v.s with E (X1) = µ and V ar (X1) = σ 2 = 0. Then n i=1 (Xi − µ) σ √ n d→ N (0, 1) as n → ∞. 2. Let {Xi , i ≥ 1} be a sequence of independent r.v.s with mean µi and variance σ 2 i , and let σ¯ 2 n = 1 n n i=1 σ 2 i . If max1≤i≤n E|Xi − µi | 2+δ 1 2+δ σ¯n ≤ B 0) for all n, n i=1 (Xi − µi ) σ¯n √ n d→ N (0, 1). Example 4 Let Xi ∼ iidB (1, p). Then EX1 = p and V ar (X1) = p (1 − p). Thus, n i=1 (Xi − p) p (1 − p) √ n d→ N (0, 1). For vector sequences, we use the following result known as the Cramer—Wold device. If {Xn} is a sequence of random vectors, Xn d→ X iff λ ′Xn d→ λ ′X for any vector λ. Example 5 Let Xi m×1 ∼ iid (0, Σ). Then, Xi √ n d→ N (0, Σ). 5.2 Consistency of b Assume 1. (Xi , εi) is a sequence of independent observations. 2. 1 n n i=1 XiX′ i = 1 nX′X P→ Q = limn→∞ 1 n n i=1 E(XiX′ i ) (> 0). 3. For any λ ∈ Rk and δ > 0, E |λ ′Xiεi | 2+δ ≤ B < ∞ for all i. The least squares estimator b may be written as b = β + 1 n XiX ′ i −1 1 n Xiεi Consider for λ ∈ Rk 1 n λ ′Xiεi = 1 n wi .
XF.,a>PEE 9Y.COxe>, IE YoF, FE XXiI W. wi-h A ch- ck h-c ndiA- f h- CLT fr a -qnc. f ind-=ndn/rv E(wi=0 a-b.fr 2.(E2+4)m≤ B2+6 all i and o2=∑E2≤B
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 4 Then, wi is an independent sequence with E (wi) = EE (λ ′Xiεi |X) = 0. In addition, E w 2 i = E (λ ′Xiεi) 2 ≤ C < ∞ which implies E|wi | 1+δ ≤ D < ∞ for all i. Lyapounov’s inequality For 0 < α ≤ β, (E |X| α ) 1/α ≤ E |X| β 1/β . Thus, by the WLLN for an independent sequence, 1 n wi P→ 0. Since this holds for any λ, 1 n n i=1 Xiεi P→ 0 and we have b P→ β + Q −1 · 0 = β. 5.3 Asymptotic normality of the least squares estimator Write b − β = XiX ′ i −1Xiεi or √ n (b − β) = 1 n XiX ′ i −1 1 √ n Xiεi . Since 1 n XiX′ i P→ Q by assumption, we need to show that √ 1 n Xiεi is normally distributed in the limit. Consider for λ ∈ R k , 1 √ n λ ′Xiεi = 1 √ n wi . We wish to check the conditions of the CLT for a sequence of independent r.v.’s. 1. E (wi) = 0 as before. 2. E|wi | 2+δ 1 2+δ ≤ B 1 2+δ for all i and σ¯ 2 n = 1 n Ew2 i ≤ B.
}-P-R/∞R2-P∞PReP-I∈F|} Thal R N(0,1) 1EP。T E(XX2=;E1入) E P EEXX2E2X7入 P TT n LXE EX XX: AQ入, thll relalt dn be wrltten m R 0,a2Q入 w hlah m mlel R X T UImc thH Tnd the clven Tlan ntlpn5we hIve T h(beB).“N0,a2Q (Redm thI XnYn. cY H Xn. c(TopnltTnt)Td Yn. y xm e Tathprl wrlte thH relalt T P ThT 5bH Tmrpxhm Teli npm T wlth n eTn B Td vTYThce-opvTYToen Ttrkx1(02Q-1 5.4 Consistency of s2 EEE EX(XXXa E K EE EX XX X
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 5 Thus wi σ¯n √ n d→ N (0, 1). Since σ¯ 2 n = 1 n E w 2 i = 1 n E (λ ′Xiεiε ′ iX ′ iλ) = 1 n EE λ ′Xiε 2 i X ′ iλ|X = 1 n E λ ′XiE ε 2 i |X X ′ iλ = σ 2 n λ ′n i=1 XiX ′ iλ → σ 2λ ′Qλ, this result can be written as wi √ n d→ N 0, σ2λ ′Qλ , which implies Xiεi √ n d→ N 0, σ2Q . Using this and the given assumption, we have √ n (b − β) d→ N 0, σ2Q −1 . (Recall that XnYn d→ cY if Xn P→ c (a constant) and Yn d→ Y ) Some authors write this result as b ≃ N β, 1 n σ 2Q −1 . That is, b is approximately normal with mean β and variance—covariance matrix 1 n (σ 2Q−1 ). 5.4 Consistency of s 2 Write s 2 = 1 n − K ε ′Mε = 1 n − K ε ′ ε − ε ′X (X ′X) −1 X ′ ε = n n − K ε ′ ε n − ε ′X n X′X n −1 X′ ε n .
Exapl e t hae Tl hSan hl reo l epfl Sor pxl hsi BRAFR {a0 TLGE02-ECINEEFMly by s2-Al5FRGvHy, wRmG AIR TL-E-EGETCIMEE5FN5 5.5 Asymptotic distribution of a function of b LE5 f(o bRGvEIF'II J CIN5-MAIAECNd CI5-MALAAy d-ffFHFN5-CbIRIAMG-INEIIb-WR wC 5TfiNd 5LRl-m-5-Mg d-EFbAfIMII f(b)By ELRTGIIFRRNEIM af(6 f(6)=f()+0m(1)+ ingate afB2-EGm GFx II SLRIEm afn( I→→ aBn ask TLRIRnGMFFSFFm bECTnFEMgl-g-bIR-Ib a B-TLAE (f(b)1f()a∈(or(a2Qm→ TLG-Ef(b) GETLGEGMEnd d-E5FbA5-IM-M5LRIl-m-5
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 6 Because ε ′ ε n = 1 n n i=1 ε 2 i P→ σ 2 ε ′X n = 1 n Xiεi P→ 0 X′X n = 1 n XiX ′ i → Q and n n − K → 1, s 2 P→ σ 2 . That is, σ 2 is consistently by s 2 . Alternatively, we may use σˆ 2 = 1 n ˆε ′ ˆε. This is also consistent. 5.5 Asymptotic distribution of a function of b. Let f (b) be a vector of J continuous and continuously differentiable functions of b. We want to find the limiting distribution of f (b). By the Taylor expansion f (b) = f (β) + ∂f (β) ∂β′ (b − β) + remainder. ∂f(β) ∂β′ is a matrix of the form ∂f1(β) ∂β1 · · · ∂f1(β) ∂βk . . . ∂fJ(β) ∂β1 · · · ∂fJ (β) ∂βk = Γ. The remainder term becomes negligible if b P→ β. Thus √ n (f (b) − f (β)) d→ N 0, Γ σ 2Q −1 Γ ′ . That is, f (b) also has a normal distribution in the limit
0OPTE-1+0= E-SO MP+E P=OPE-TIES OF T)E+SE E xe B).0(TQ d→}ddar+hblw析 i bd =n21>6k]A C nGx=(Bx×B.o(mPre-rA 5.6 More general assumption on the regressors We have assumed Xi) is a sequence of independent observations. This assumption may occasionally be violated in practice. For example, consider the autoregressive model of order p F where F Y iid XTo )AHere, the regressors are correlated over time. Still, consist ency and asymptotic normality of the OLS estimator follows if we make a few extra assumptions These should be dealt with in a more advanced course 5.7 Instrumental variables estimation We have assumed G平,C) which implies G Y Ci)=. AThere are many examples for the violation of this assumption r, rqi-byhrd for i thbd hbcr +}t Sti bh,)i-t) |bf}d,→}b-i-}t v-2 Jnb]di b hbdr,. +hb wr b. hb d Ct=a<BSt <EA BS=C1< It AUdiT→}小一}qi+ hbd -if BSt <E<I St X<E<IA v-rdSt ibdei. h/a-Nd
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 7 Example 6 Suppose √ n (b − β) d→ N 0, σ 2Q −1 . What is the distribution of b 2 1 + · · · + b 2 k ? Here f (b) = b 2 1 + · · · b 2 k and Γ = [2β1 , · · · , 2βk ] . Thus √ n b 2 1 + · · · + b 2 k − β 2 1 + · · · + β 2 k d→ N 0, Q2ΓQ −1Γ ′ . 5.6 More general assumption on the regressors We have assumed (Xi) is a sequence of independent observations. This assumption may occasionally be violated in practice. For example, consider the autoregressive model of order p yt = α1yt−1 + · · · + αpyt−p + εt where εt ∼ iid (0, σ2 ). Here, the regressors are correlated over time. Still, consistency and asymptotic normality of the OLS estimator follows if we make a few extra assumptions. These should be dealt with in a more advanced course. 5.7 Instrumental variables estimation We have assumed E (εi |X) = 0 which implies E (εiXi) = 0. There are many examples for the violation of this assumption. Example 7 (Simultaneous equations) Let Ct : consumption at time t Yt : income at time t It : investment at time t The Keynesian consumption function is Ct = α + βYt + εt . But Yt = Ct + It . Using these two equations, we have Yt = α + βYt + εt + It ⇒ Yt = 1 1 − β (α + εt + It). Thus Yt and εt are correlated
CHAPTER 5 LARGE-SAMPLE PROPERTIES OF THE LSE Example 8(Autoregressive Moving Average model Consider the ARMA(1, 1)model Wt=ayt_1+Et+ 0Et-1 Et wiid(0,a) a<1,l Writing Wt=aut-1+Et +BEt-1 3-2+aet-1+a6t and adding all of the se equations, we obtain M=Et+(6+a)et-1+a(6+a)=t-2+ Thus t-1 and Et-1 are correlated xample 9(Me Let the true regression model b +BTi+ Suppose that we observe (2~id(0,a2) stead of ai due to measurement error. Then, the regression model we use will be a+B(x-2)+E Obvio usly, a and the error terms are correlated Example 10(Dymamic pandel data model Panel data: collection of time series and cro ss-sectional observations Eample 11 Collection of inco me survey over a period of time, collection of stock indices over a period of time, ete
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 8 Example 8 (Autoregressive Moving Average model) Consider the ARMA(1, 1) model yt = αyt−1 + εt + θεt−1 εt ∼ iid 0, σ2 |α| < 1, |θ| < 1. Writing yt = αyt−1 + εt + θεt−1 αyt−1 = α 2 yt−2 + αεt−1 + αθεt−2 . . . and adding all of these equations, we obtain yt = εt + (θ + α) εt−1 + α (θ + α) εt−2 + · · · . Thus yt−1 and εt−1 are correlated. Example 9 (Measurement error) Let the true regression model be yi = α + βxi + εi . Suppose that we observe x ∗ i = xi + wi wi ∼ iid 0, σ2 w instead of xi due to measurement error. Then, the regression model we use will be yi = α + β (x ∗ i − wi) + εi = α + βx∗ i + εi − βwi . Obviously, x ∗ i and the error terms are correlated. Example 10 (Dynamic pandel data model) Panel data: collection of time series and cross—sectional observations. Example 11 Collection of income survey over a period of time, collection of stock indices over a period of time, etc
{}-/h→a-iqn=h-→c→|f-qcr|} 1 Su =69+-1REPEEY>cO, IKXIvxP FYi EElgEly ecd m(1219Qzx0 T 工Qceg>F YO y0 A XE()≠.0 2.1Z(2 Qzx wSRIEk Qzx-Ko 3.1zz→QzX.)0 4.EⅫZ)=.0 5. EXn 2)=010 6. Xim)g Ee>Y. cC, Igdxxxcdxc P, bE>EEg ca 7.F,FFy入∈REd6=,iEX(m≤B+∞LFi0
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 9 Let yit = δyi,t−1 + x ′ itβ + uit uit = ui + vit (one—way error component model) where ui ∼ iid (0, σ 2 u ) and vit ∼ iid (0, σ2 v ). ui is called unobserved individual effect variable. Since yi,t−1 is a function of ui , it is correlated with uit. Thus, OLS estimator is inconsistent. Suppose that a sequence of K × 1 vector (Zi) satisfy 1 n Ziεi p→ 0 and 1 n XiZ ′ i p→ QZX. Then, yiZi = ZiX ′ iβ + Ziεi p→ QZXβ Thus, as an estimator of β, we consider bIV = (Z ′X) −1 Z ′ y. Assume 1. E (ε|X) = 0. 2. 1 n Z ′X p→ QZX with rank QZX = K. 3. 1 n Z ′Z → QZZ (> 0). 4. E (ε|Z) = 0. 5. E (εε′ |Z) = σ 2 I. 6. (Zi , εi) is a sequence of independent observations. 7. For any λ ∈ R k and δ > 0, E |λ ′Xiεi | 2+δ ≤ β < ∞ for all i
{}-/h→a-iqn=h-→c→|f-qcr|} Write the iv est imator as eiZ( G Qzx and iZ'mG' +bv 0 B as n0+0 n addition I mdd which implies Nn(bIV EB)O N(ioQzxQzzQx7'0 As for the ols est imation. a natural est imator of oz is E({bv)20 We can show that c2 0 So far, number of instrument s=number of regressors. What if number of instruments number of regressors? Then we use 2(22)-z((2(z2)-z0 This is equivalent to 2(z2)-z(0 (part of( explained by Z) The estimator is called the 2Estage least squares estimator. Its asymptot ic properties 2. n(b e B) U N(io2QzxQ2
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 10 Write the IV estimator as bIV = β + (Z ′X) −1 Z ′ ε. Since 1 n Z ′X p→ QZX and 1 n Z ′ ε p→ 0, bIV p→ β as n → ∞. In addition, 1 √ n Z ′ ε d→ N 0, σ2QZZ , which implies √ n (bIV − β) d→ N 0, σ2Q −1 ZXQZZQ −1 XZ . As for the OLS estimation, a natural estimator of σ 2 is σˆ 2 = 1 n (yi − X ′ i bIV ) 2 . We can show that σˆ 2 p→ σ 2 as n → ∞. So far, number of instruments = number of regressors. What if number of instruments > number of regressors? Then we use bIV = X ′Z (Z ′Z) −1 Z ′X −1 X ′Z (Z ′Z) −1 Z ′ y. This is equivalent to Xˆ′Xˆ −1 Xˆ′ y where Xˆ = Z (Z ′Z) −1 Z ′X. (part of X explained by Z) The estimator is called the 2−stage least squares estimator. Its asymptotic properties are: 1. bIV p→ β 2. √ n (bIV − β) d→ N 0, σ2Q −1 ZXQZZQ −1 XZ