Ch. 21 Univariate Unit Root process 1 Introduction Consider OLS estimation of a AR(1)process, Yt= pYt-1+ut where ut w ii d (0, 0), and Yo=0. The OLS estimator of p is given by and we also have (1) t=1 When the true value of p is less than l in absolute value, then y(so does y ?) is a covariance-stationary process. Applying LLN for a covariance process(see 9.19 of Ch. 4)we have 21) T·a4 1-p2 /T=a2/(1-p2 t=1 Since Yt-lut is a martingale difference sequence with variance E(Y-1)2 and t=1 Applying CLT for a martingale difference sequence to the second term in the righthand side of (1)we have Y-1ut)→N(0 t=1
Ch. 21 Univariate Unit Root Process 1 Introduction Consider OLS estimation of a AR(1) process, Yt = ρYt−1 + ut , where ut ∼ i.i.d.(0,σ2 ), and Y0 = 0. The OLS estimator of ρ is given by ρˆT = PT t=1 Yt−1Yt PT t=1 Y 2 t−1 = X T t=1 Y 2 t−1 !−1 X T t=1 Yt−1Yt ! and we also have (ˆρT − ρ) = X T t=1 Y 2 t−1 !−1 X T t=1 Yt−1ut ! . (1) When the true value of ρ is less than 1 in absolute value, then Yt (so does Y 2 t ?) is a covariance-stationary process. Applying LLN for a covariance process (see 9.19 of Ch. 4) we have ( X T t=1 Y 2 t−1 )/T p −→ E[(X T t=1 Y 2 t−1 )/T] = T · σ 2 1 − ρ 2 /T = σ 2 /(1 − ρ 2 ). (2) Since Yt−1ut is a martingale difference sequence with variance E(Yt−1ut) 2 = σ 2 σ 2 1 − ρ 2 and 1 T X T t=1 σ 2 σ 2 1 − ρ 2 → σ 2 σ 2 1 − ρ 2 . Applying CLT for a martingale difference sequence to the second term in the righthand side of (1) we have 1 √ T ( X T t=1 Yt−1ut) L−→ N(0,σ2 σ 2 1 − ρ 2 ). (3) 1
Substituting(2)and(3)to(1)we have N(0 N(0,1 (6)is not valid for the case when p= 1, however. To see this, recall that the variance of Yt when p= l is to then the Lln as in(2) would not be valid since if we apply clt, then it would incur that ∑2mE(∑y2/=a2一∞ t=1 t=1 Similar reason would show that the CLT would not apply for v(> In stead, T-(Et Yt-1ut)converges. )To obtain the limiting distribution, as we shall prove in the following, for(PT-p)in the unit root case, it turn out that we have to multiply (er -p) by T rather than by VT 1 T t=1 t=1 Thus, the unit root coefficient converge at a faster rate(T)than a coefficient for stationary regression( which converges at VT)
Substituting (2) and (3) to (1) we have √ T(ˆρT − ρ) = [(X T t=1 Y 2 t−1 )/T] −1 · √ T[(X T t=1 Yt−1ut)/T] (4) L−→ σ 2 1 − ρ 2 −1 N(0,σ2 σ 2 1 − ρ 2 ) (5) ≡ N(0, 1 − ρ 2 ). (6) (6) is not valid for the case when ρ = 1, however. To see this, recall that the variance of Yt when ρ = 1 is tσ2 , then the LLN as in (2) would not be valid since if we apply CLT, then it would incur that ( X T t=1 Y 2 t−1 )/T p −→ E[(X T t=1 Y 2 t−1 )/T] = σ 2 PT t=1 t T → ∞. (7) Similar reason would show that the CLT would not apply for √ 1 T ( PT t=1 Yt−1ut). ( In stead, T −1 ( PT t=1 Yt−1ut) converges.) To obtain the limiting distribution, as we shall prove in the following, for (ˆρT − ρ) in the unit root case, it turn out that we have to multiply (ˆρT − ρ) by T rather than by √ T: T(ˆρT − ρ) = " ( X T t=1 Y 2 t−1 )/T2 #−1 " T −1 ( X T t=1 Yt−1ut) # . (8) Thus, the unit root coefficient converge at a faster rate (T) than a coefficient for stationary regression ( which converges at √ T). 2
2 Unit Root Asymptotic theories In this section, we develop tools to handle the asymptotics of unit root process 2.1 Random walks and wiener Process Consider a random walk Yt=Yt-1+Et where Yo =0 and Et is i.i.d. with mean zero and Var(et)=02<oo By repeated substitution we have +Et=Yt-2+Et-1+Et S=」 Before we can study the behavior of estimators based on random walks, we must understand in more detail the behavior of the random walk process itsel Thus, consider the random walk Yt, we can write Rescaling, we have /o=T12∑en t=1 (It is important to note here should be read as Var(T-12 E+Et)=ET-I Et)21 Tg==o2 According to the Lindeberg-Levy CLT, we have T-12/o-→N(0,1) More generally, we can construct a variable Yr(r) from the partial sum of Et Trl
2 Unit Root Asymptotic Theories In this section, we develop tools to handle the asymptotics of unit root process. 2.1 Random Walks and Wiener Process Consider a random walk, Yt = Yt−1 + εt , where Y0 = 0 and εt is i.i.d. with mean zero and V ar(εt) = σ 2 < ∞. By repeated substitution we have Yt = Yt−1 + εt = Yt−2 + εt−1 + εt = Y0 + X t s=1 εs = X t s=1 εs. Before we can study the behavior of estimators based on random walks, we must understand in more detail the behavior of the random walk process itself. Thus, consider the random walk {Yt}, we can write YT = X T t=1 εt . Rescaling, we have T −1/2YT /σ = T −1/2X T t=1 εt/σ. (It is important to note here σ 2 should be read as V ar(T −1/2 PT t=1 εt) = E[T −1 ( Pεt) 2 ] = T·σ 2 T = σ 2 .) According to the Lindeberg-L´evy CLT, we have T −1/2YT /σ L−→ N(0, 1). More generally, we can construct a variable YT (r) from the partial sum of εt YT (r) = [T r] X∗ t=1 εt , 3
where 0<r< l and Tr* denotes the largest integer that is less than or equal Applying the same rescaling, we define Wr()≡T-1/Yr(r)/a (9) Now Trl W()=T1()()∑e/ and for a given r, the term in the brackets again obeys the CLT and converges in distribution to N(0, 1), whereas T-12([Tr]*1/2 converges to r1/2. It follows from standard arguments that Wr(r) converges in distribution to N(O, r) We have written Wr(r) so that it is clear that Wr can be considered to be a function of r. Also, because Wr(r) depends on the E s, it is random. There- fore, we can think of Wr(r) as defining a random function of r, which we write Wr(. Just as the CLT provides conditions ensuring that the rescaled random walk T-1/Yr/o(which we can now write as Wr(1) converges, as T become large, to a well-defined limiting random variables(the standard normal), the function central limit theorem(FCLt) provides conditions ensuring that the random function Wr( converge, as T become large, to a well-defined limit ran- dom function, say W(. The word "Functional"in Functional Central Limit theorem appears because this limit is a function of r Some further properties of random walk, suitably rescaled, are in the follow P If Yt is a random walk, then Yta -Yis is independent of Yt2 -Yt for all ti <t2< t3 <t4. Consequently, W(ra)-Wr(r3) is independent of Wi(r2)-Wr(ri) for all T·r=t1,i=1
where 0 ≤ r ≤ 1 and [Tr] ∗ denotes the largest integer that is less than or equal to Tr. Applying the same rescaling, we define WT (r) ≡ T −1/2YT (r)/σ (9) = T −1/2 [T r] X∗ t=1 εt/σ. (10) Now WT (r) = T −1/2 ([Tr] ∗ ) 1/2 ([Tr] ∗ ) −1/2 [T r] X∗ t=1 εt/σ , and for a given r, the term in the brackets {·} again obeys the CLT and converges in distribution to N(0, 1), whereas T −1/2 ([Tr] ∗ ) 1/2 converges to r 1/2 . It follows from standard arguments that WT (r) converges in distribution to N(0,r). We have written WT (r) so that it is clear that WT can be considered to be a function of r. Also, because WT (r) depends on the ε ′ t s, it is random. Therefore, we can think of WT (r) as defining a random function of r, which we write WT (·). Just as the CLT provides conditions ensuring that the rescaled random walk T −1/2YT /σ (which we can now write as WT (1)) converges, as T become large, to a well-defined limiting random variables (the standard normal), the function central limit theorem (FCLT) provides conditions ensuring that the random function WT (·) converge, as T become large, to a well-defined limit random function, say W(·). The word ”Functional” in Functional Central Limit theorem appears because this limit is a function of r. Some further properties of random walk, suitably rescaled, are in the following. Proposition: If Yt is a random walk, then Yt4 − Yt3 is independent of Yt2 − Yt1 for all t1 < t2 < t3 < t4. Consequently, Wt(r4) − WT (r3) is independent of Wt(r2) − WT (r1) for all [T · ri ] ∗ = ti ,i = 1,..., 4. 4
Note that +Et4-1+….+Et +ct2-1+….+E Since(Eta, Eta-1., Et1+1) is independent of (Et4, Et-1, . Eta+1) it follow that Yi.- Yta and Yto -yt 1 are independe Wr(r4)-Wr(r3) 2(et4+et4-1+…+et2+1)/ is independent of (et2+E2-1+….+t1+1) P For given0≤a<b≤1,Wr(b)-Wr(a)→N(0,b-a)asT→ Proof: by definition (b)-Wr(a) Et t=ITa+l =T-1/(b-md)2x(rb-r)∑s t=[Ta1*+1 The last term(Tb"-Tal*-1/22-p -t=ITa] +1 Et -N(O, 1) by the CLt, and T-12(T-Ta)12=(T-Tam)2→(b-a)/2asT→∞. Hence Wr(b)-Wr(a)-N(0, b-a)
Proof: Note that Yt4 − Yt3 = εt4 + εt4−1 + ... + εt3+1, Yt2 − Yt1 = εt2 + εt2−1 + ... + εt1+1. Since (εt2 ,εt2−1,...,εt1+1) is independent of (εt4 ,εt4−1,...,εt3+1) it follow that Yt4 − Yt3 and Yt2 − Yt1 are independent. Consequently, WT (r4) − WT (r3) = T −1/2 (εt4 + εt4−1 + ... + εt3+1)/σ is independent of WT (r2) − WT (r1) = T −1/2 (εt2 + εt2−1 + ... + εt1+1)/σ. Proposition: For given 0 ≤ a < b ≤ 1, WT (b) − WT (a) L−→ N(0,b − a) as T → ∞. Proof: By definition WT (b) − WT (a) = T −1/2 [T b] X∗ t=[T a] ∗+1 εt = T −1/2 ([Tb] ∗ − [Ta] ∗ ) 1/2 × ([Tb] ∗ − [Ta] ∗ ) −1/2 [T b] X∗ t=[T a] ∗+1 εt . The last term ([Tb] ∗ − [Ta] ∗ ) −1/2 P[T b] ∗ t=[T a] ∗+1 εt L−→ N(0, 1) by the CLT, and T −1/2 ([Tb] ∗ − [Ta] ∗ ) 1/2 = (([Tb] ∗ − [Ta] ∗ )/T) 1/2 → (b − a) 1/2 as T → ∞. Hence WT (b) − WT (a) L−→ N(0,b − a). 5
In words, the random walk has independent increments and those increments have a limiting normal distribution, with a variance reflecting the size of the interval (b-a) over which the increment is taken It should not be surprising, therefore, that the limit of the sequence of function r( constructed from the random walk preserves these properties in the limit n an appropriate sense. In fact, these properties form the basis of the definition of the Wiener process Definition Let(S, F, p) be a complete probability space. Then W: S[0, 1]Ris a standard Wiener process if each of r E [ 0, 1],w(, r) is F-measurable, and in addition (1). The process starts at zero: PW(, 0)=0=1 (2). The increments are independent:if0≤ao≤a1….≤ak≤1,then w(, ai)-W(, ai-1)is independent of w(, ai)-w(, ai-1),j=1,.,k ,j+ (3). The increments are normally distributed: For 0<a<b<1, the increment r(, b)-w(, a) is distribut In the definition, we have written W(, a) explicitness; whenever convenient however, we will write W(a) instead of w(, a), analogous to our notation else- where. The Wiener process is also called a brownian motion in honor of nor bert Wiener(1924), who provided the mathematical foundation for the theory of random motions observed and described by nineteenth century botanist Robert Brown in 1827 2.2 Functional central limit Theorems We earlier defined convergence in law for random variables, and now we need to extend the definition to cover random functions. Let s( represent a continuous- time stochastic process with S(r)representing its value at some date r for r E 0,1. Suppose, further, that any given realization, S() is a continuous function of r with probability 1. For Sr(T_1 a sequence of such continuous function
In words, the random walk has independent increments and those increments have a limiting normal distribution, with a variance reflecting the size of the interval (b − a) over which the increment is taken. It should not be surprising, therefore, that the limit of the sequence of function WT (·) constructed from the random walk preserves these properties in the limit in an appropriate sense. In fact, these properties form the basis of the definition of the Wiener process. Definition: Let (S, F,P) be a complete probability space. Then W : S × [0, 1] → R 1 is a standard Wiener process if each of r ∈ [0, 1], W(·,r) is F-measurable, and in addition, (1). The process starts at zero: P[W(·, 0) = 0] = 1. (2). The increments are independent: if 0 ≤ a0 ≤ a1... ≤ ak ≤ 1, then W(·,ai) − W(·,ai−1) is independent of W(·,aj ) − W(·,aj−1), j = 1,..,k, j 6= i for all i = 1,...,k. (3). The increments are normally distributed: For 0 ≤ a ≤ b ≤ 1, the increment W(·,b) − W(·,a) is distributed as N(0,b − a). In the definition, we have written W(·,a) explicitness; whenever convenient, however, we will write W(a) instead of W(·,a), analogous to our notation elsewhere. The Wiener process is also called a Brownian motion in honor of Norbert Wiener (1924), who provided the mathematical foundation for the theory of random motions observed and described by nineteenth century botanist Robert Brown in 1827. 2.2 Functional Central Limit Theorems We earlier defined convergence in law for random variables, and now we need to extend the definition to cover random functions. Let S(·) represent a continuoustime stochastic process with S(r) representing its value at some date r for r ∈ [0, 1]. Suppose, further, that any given realization, S(·) is a continuous function of r with probability 1. For {ST (·)} ∞ T =1 a sequence of such continuous function, 6
we say that the sequence of probability measure induced by ST(JT_I weakly converge to the probability measure induced by S(), denoted by Sr(=>SO if all of the following hold: (1). For any finite collection of k particular dates, 0≤10, the probability that Sr(ri) differs from Sr(r2) for any dates rI and r2 within 8 of each other goes to zero uniformly in T as 8-0 (3).P{|Sr(0)|>A→0 uniformly in T as A→o This definition applies to sequences of continuous functions, though the func- tion in(9)is a discontinues step function. Fortunately, the discontinuities occur at a countable set of points. Formally, Sr( can be replaced with a similar con- tinuous function, interpolating between the steps The Function Central Limit Theorem(FCLT) provides conditions under which converges to the standard Wiener process, W. The simplest FCLT is a gen- eralization of the Lindeberg-levy clt, known as Donsker's theorem Theorem:(Donsker) Let Et be a sequence of i i d. random scalars with mean zero. If a= Var(Et)< oo, ≠0, then w Because pointwise convergence in distribution Wr( r)-w(, r)for each rE0, 1 is necessary(but not sufficient) for weak convergence Wr=w, the Lindeberg- Levy CLT(Wr(, 1)-w(, 1) follows immediately from Donsker's m is strictly stronger than both use identical assumptions, but Donsker's theorem delivers a much stronger
we say that the sequence of probability measure induced by {ST (·)} ∞ T =1 weakly converge to the probability measure induced by S(·), denoted by ST (·) =⇒ S(·) if all of the following hold: (1). For any finite collection of k particular dates, 0 ≤ r1 0, the probability that ST (r1) differs from ST (r2) for any dates r1 and r2 within δ of each other goes to zero uniformly in T as δ → 0; (3). P{|ST (0)| > λ} → 0 uniformly in T as λ → ∞. This definition applies to sequences of continuous functions, though the function in (9) is a discontinues step function. Fortunately, the discountinuities occur at a countable set of points. Formally, ST (·) can be replaced with a similar continuous function, interpolating between the steps. The Function Central Limit Theorem (FCLT) provides conditions under which WT converges to the standard Wiener process, W. The simplest FCLT is a generalization of the Lindeberg-L´evy CLT, known as Donsker’s theorem. Theorem: (Donsker) Let εt be a sequence of i.i.d. random scalars with mean zero. If σ 2 ≡ V ar(εt) < ∞, σ 2 6= 0, then WT =⇒ W. Because pointwise convergence in distribution WT (·,r) L−→ W(·,r) for each r ∈ [0, 1] is necessary (but not sufficient) for weak convergence WT =⇒ W, the Lindeberg-L´evy CLT (WT (·, 1) L−→ W(·, 1)) follows immediately from Donsker’s theorem. Donsker’s theorem is strictly stronger than Lindeberg-L´evy however, as both use identical assumptions, but Donsker’s theorem delivers a much stronger 7
conclusion. Donsker called his result an invariance principle. Consequently, the FCLT is often referred as an invariance principle So far, we have assumed that the sequence Et used to construct Wr is i.i.d Nevertheless, just as we can obtain central limit theorems when Et is not necessary id. In fact, versions of the FCLT hold for each CLt previous given in Chapter 4 Theorem: Continuous Mapping Theorem If Sr(=>S( and g() is a continuous functional, then g(Sr()=>9(SO) In the above theorem, continuity of a functional g( means that for any s>0 there exist ad>0 such that if h(r)and k(r) are any continuous bounded functions on [ 0, 1], h:[ 0, 1-R and k: [0, 1]R, such that h(r)-k(r)l< 5 for all r 0, 1, then Ig(h())-g(k()<s
conclusion. Donsker called his result an invariance principle. Consequently, the FCLT is often referred as an invariance principle. So far, we have assumed that the sequence εt used to construct WT is i.i.d.. Nevertheless, just as we can obtain central limit theorems when εt is not necessary i.i.d.. In fact, versions of the FCLT hold for each CLT previous given in Chapter 4. Theorem: Continuous Mapping Theorem: If ST (·) =⇒ S(·) and g(·) is a continuous functional, then g(ST (·)) =⇒ g(S(·)). In the above theorem, continuity of a functional g(·) means that for any ς > 0, there exist a δ > 0 such that if h(r) and k(r) are any continuous bounded functions on [0, 1], h : [0, 1] → R 1 and k : [0, 1] → R 1 , such that |h(r) − k(r)| < δ for all r ∈ [0, 1], then |g(h(·)) − g(k(·))| < ς. 8
3 Regression with a Unit Root 3.1 Dickey-Fuller Test, Yt is AR(1) process Consider the following simple AR(1)process with a unit root Yt= BYt-1+ut B=1 where Yo =0 and ut is i.i.d. with mean zero and variance o We consider the three least square regression Yt= BYt-1+ut, and Yt=a+ BYt-1+ot+it, nd(a, B, 8)are the conventional least-squares regression coef- ficients. Dickey and Fuller(1979) were concerned with the limiting distribution of the regression in(13),(14), and(15)(B, (&, B), and(a, B, 8)) under the null hypothesis that the data are generated by(11) and(12 We first provide the following asymptotic results of the sample moments which are useful to derive the asymptotics of the Ols estimator Let ut be a i.i.d. sequence with mean zero and variance aand yt ut for t=l (16) with yo=0. Then (a)T-i →σW ∑Y21→→2J0u(r)]2dr
3 Regression with a Unit Root 3.1 Dickey-Fuller Test, Yt is AR(1) process Consider the following simple AR(1) process with a unit root, Yt = βYt−1 + ut , (11) β = 1 (12) where Y0 = 0 and ut is i.i.d. with mean zero and variance σ 2 . We consider the three least square regression Yt = βY˘ t−1 + ˘ut , (13) Yt = ˆα + βYˆ t−1 + ˆut , (14) and Yt = ˜α + βY˜ t−1 + ˜δt + ˜ut , (15) where β, ˘ (ˆα, βˆ), and (˜α, β, ˜ ˜δ) are the conventional least-squares regression coef- ficients. Dickey and Fuller (1979) were concerned with the limiting distribution of the regression in (13), (14), and (15) (β, ˘ (ˆα, βˆ), and (˜α, β, ˜ ˜δ)) under the null hypothesis that the data are generated by (11) and (12). We first provide the following asymptotic results of the sample moments which are useful to derive the asymptotics of the OLS estimator. Lemma: Let ut be a i.i.d. sequence with mean zero and variance σ 2 and yt = u1 + u2 + ... + ut for t = 1, 2,...,T, (16) with y0 = 0. Then (a) T − 1 2 P T t=1 ut L−→ σW(1), (b) T −2 P T t=1 Y 2 t−1 L−→ σ 2 R 1 0 [W(r)]2dr, 9
Jo w(r) t=1 (d)T-∑Y-1lo2(1)2-1 (e)T-i[w(1)-o w(r)drI (r∑=1nW() (g)T3∑tY2 ∫r{W(r)dr a joint weak convergence for the sample moments given above to their respective limits is easily established and will be used below Proof: (a) is a straightforward results of Donsker's Theorem with r= 1 (b) First rewrite T->Y2, in term of Wr(rt-1)=T-/Y-10=T-1/2>us/o Wr(r)is constant for(t-l)/T (r ∑Wr(n-12=∑/Wr()3tr t=1 Wr(r)dr The continuous mapping theorem applies to h(Wr)=o wr(r)2dr. It follows that hi(Wr)→h(), so that t-2∑1Y21→→o2/bw(n)ab, as claimed (c). The proof of item(c)is analogous to that of(b). First rewrite T-3/2>tYt n term of Wr(rt-1=T-12Yi-1/o=T-1/2e us/ o, where rt-1=(t-1)/T so that T-3/2yT ta wr(rt-1). Because Wr(r) is constant for
(c) T − 3 2 P T t=1 Yt−1 L−→ σ R 1 0 W(r)dr, (d) T −1 P T t Yt−1ut L−→ 1 2 σ 2 [W(1)2 − 1], (e) T − 3 2 P T t=1 tut L−→ σ[W(1) − R 1 0 W(r)dr], (f) T − 5 2 P T t=1 tYt−1 L−→ σ R 1 0 rW(r)dr, (g) T −3 P T t=1 tY 2 t−1 L−→ σ 2 R 1 0 r[W(r)]2dr. A joint weak convergence for the sample moments given above to their respective limits is easily established and will be used below. Proof: (a) is a straightforward results of Donsker’s Theorem with r = 1. (b) First rewrite T −2 P T t=1 Y 2 t−1 in term of WT (rt−1) ≡ T −1/2Yt−1/σ = T −1/2 t P−1 s=1 us/σ, where rt−1 = (t − 1)/T, so that T −2 P T t=1 Y 2 t−1 = σ 2T −1 P T t=1 WT (rt−1) 2 . Because WT (r) is constant for (t − 1)/T ≤ r < t/T, we have T −1X T t=1 WT (rt−1) 2 = X T t=1 Z t/T (t−1)/T WT (r) 2 dr = Z 1 0 WT (r) 2 dr. The continuous mapping theorem applies to h(WT ) = R 1 0 WT (r) 2dr. It follows that h(WT ) =⇒ h(W), so that T −2 PT t=1 Y 2 t−1 =⇒ σ 2 R 1 0 W(r) 2dr, as claimed. (c). The proof of item (c) is analogous to that of (b). First rewrite T −3/2 PT t=1 Yt−1 in term of WT (rt−1) ≡ T −1/2Yt−1/σ = T −1/2 Pt−1 s=1 us/σ, where rt−1 = (t − 1)/T, so that T −3/2 PT t=1 Yt−1 = σT −1 PT t=1 WT (rt−1). Because WT (r) is constant for 10