Ch. 22 Unit root in Vector Time series 1 Multivariate Wiener Processes and multivari- ate FCLT Section 2.1 of Chapter 21 described univariate standard Brownian motion W(r) as a scalar continuous-time process(W: rE0, 1-R). The variable W(r) has a N(O, r)distribution across realization, and for any given realization, w(r) is continuous function of the date r with independent increments. If a set of k such independent processes, denoted Wi(r), W2(r),.,Wk(r), are collected in a (kx1) vector w(r), the results is k-dimentional standard Brownian motion Definition 1 A k-dimensional standard Brownian motion w( is a continuous-time process as- sociating each dater E0, 1] with the(k x 1)vector w(r)satisfying the following (b). For any dates0≤n1<r2<…<Tk≤1, the changes w(r2)-w(r1)],w(r3)- w(r2),,w(rk)-w(rk-1)) are independent multivariate Gaussian with w(s) w()~N(0,(s-t)Ik) (c). For any given realization, w(r) is continuous in r with probability 1 Analogous to the univariate case. we can define a multivariate random walk as follows Definition Let the k x 1 random vector yt follow yt= yt-1+Et, t= 1, 2, .. where yo =0 and Et is a sequence of ii d random vector such that E(Et)=0 and E(EtE=n,a finite positive definite matrix. Then yt is a multivariate(k-dimensional)random form the rescaled partial sums as g-12T- t=1
Ch. 22 Unit Root in Vector Time Series 1 Multivariate Wiener Processes and Multivariate FCLT Section 2.1 of Chapter 21 described univariate standard Brownian motion W(r) as a scalar continuous-time process (W : r ∈ [0, 1] → R 1 ). The variable W(r) has a N(0, r) distribution across realization, and for any given realization, W(r) is continuous function of the date r with independent increments. If a set of k such independent processes, denoted W1(r), W2(r), ..., Wk(r), are collected in a ( k×1) vector w(r), the results is k−dimentional standard Brownian motion. Definition 1: A k-dimensional standard Brownian motion w(·) is a continuous-time process associating each date r ∈ [0, 1] with the (k×1) vector w(r) satisfying the following: (a). w(0) = 0; (b). For any dates 0 ≤ r1 < r2 < ... < rk ≤ 1, the changes [w(r2)−w(r1)], [w(r3)− w(r2)], ..., [w(rk) − w(rk−1)] are independent multivariate Gaussian with [w(s) − w(v)] ∼ N(0,(s − v)Ik); (c). For any given realization, w(r) is continuous in r with probability 1. Analogous to the univariate case, we can define a multivariate random walk as follows. Definition: Let the k×1 random vector yt follow yt = yt−1+εt , t = 1, 2, ..., where y0 = 0 and εt is a sequence of i.i.d. random vector such that E(εt) = 0 and E(εtε 0 t ) = Ω, a finite positive definite matrix. Then yt is a multivariate (k-dimensional) random walk. We form the rescaled partial sums as wT (r) ≡ Ω −1/2 T −1/2 [Tr] X∗ t=1 εt . 1
The components of wr(r) are the individual partial sums Wr()=T2∑ t=1 where Eti is the jth element of Q2-2E The Function Central Limit Theorem(FCLT) provides conditions under which wr(r) converges to the multivariate standard Wiener process w(r). The simplest multivariate Fclt is the multivariate Donsker's theorem Theorem 1(Multivariate Donsker) Let et be a sequence of ii d random vector such that e(Et)=0 and E(E,E)=S a finite positive definite matrix. Then wr()=w( Quite general multivariate FCLTs are available. For example, we may applied FCLT to serially dependent vector processes using a generalization of(70) and Theorem 12 of Chapter 21 Theorem 2(FCLT when ut is a vector M A(oo) then Wr()=w() where wr()=业(1)-1g-1/r-12∑mε1, Et is a k dimensional i id,rand vector with variance covariance nL, and if v,(s denote the row i, column j element of亚 ∑s1。1<∞ for eac roo Using multivariate Beveridge-Nelson decomposition and from that to derive the long run variance matrix of ut to be tE(ut)]=4-(1)2
The components of wT (r) are the individual partial sums WTj(r) = T −1/2 [Tr] X∗ t=1 ε˜tj, j = 1, 2, ..., k, where ε˜tj is the jth element of Ω −1/2 εt . The Function Central Limit Theorem (FCLT) provides conditions under which wT (r) converges to the multivariate standard Wiener process w(r). The simplest multivariate FCLT is the multivariate Donsker’s theorem. Theorem 1(Multivariate Donsker): Let εt be a sequence of i.i.d. random vector such that E(εt) = 0 and E(εtε 0 t ) = Ω, a finite positive definite matrix. Then wT (·) =⇒ w(·). Quite general multivariate FCLTs are available. For example, we may applied FCLT to serially dependent vector processes using a generalization of (70) and Theorem 12 of Chapter 21. Theorem 2 (FCLT when ut is a vector MA(∞) process): Let ut = X∞ s=0 Ψsεt−s, then wT (·) =⇒ w(·), where wT (r) ≡ Ψ(1)−1Ω −1/2T −1/2 P[Tr] ∗ t=1 εt , εt is a k dimensional i.i.d. random vector with variance covariance Ω, and if ψ (s) ij denote the row i, column j element of Ψs, X∞ s=0 s · |ψ (s) ij | < ∞ for each i, j = 1, 2, ..., k. Proof: Using multivariate Beveridge-Nelson decomposition and from that to derive the long run variance matrix of ut to be 1 T E[ P(ut) 2 ] = Ψ2 (1)Ω. 2
2 Vector Autoregression Containing Unit Roots Let yt be an(k x 1)vector autoregressive process(VAR(p),i.e k-重1L-重22-…-重,Ly= The scalar algebra in(33)of Chapter 21 works perfectly well for matrices, es- tablishing that for any value重1,重2,…,更p, the following polynomials are equiv I-重1L-重2L2-…-重門] (Ik-pL-(5,L+52L n1Dm-1)1-L) 更1+更 重+1+重+2+…+重]fors=1,2, It follows that any VAR(p) process(1) can always be written in the form (Lk-pLy-(1L+E2L2+…+5-1L-1)(1-L yt=1△y-1+2△yt-2+…+5p-yt-p+1+c+pyt-1+Et There are tow meanings of a V AR process contains unit roots First, if the first difference of yt follows a V AR(p-1)process 1△yt-1+52△ Sp-1yt-p+1+C+ requiring from(4)that
2 Vector Autoregression Containing Unit Roots Let yt be an (k × 1) vector autoregressive process (V AR(p)), i.e. [Ik − Φ1L − Φ2L 2 − ... − ΦpL p ]yt = c + εt . (1) The scalar algebra in (33) of Chapter 21 works perfectly well for matrices, establishing that for any value Φ1, Φ2,..., Φp, the following polynomials are equivalent: [Ik − Φ1L − Φ2L 2 − ... − ΦpL p ] = (Ik − ρL) − (ξ1L + ξ2L 2 + ... + ξp−1L p−1 )(1 − L), where ρ ≡ Φ1 + Φ2 + ... + Φp (2) ξs ≡ −[Φs+1 + Φs+2 + ... + Φp] for s = 1, 2, ..., p − 1. (3) It follows that any V AR(p) process (1) can always be written in the form (Ik − ρL)yt − (ξ1L + ξ2L 2 + ... + ξp−1L p−1 )(1 − L)yt = c + εt or yt = ξ14yt−1 + ξ24yt−2 + ... + ξp−1yt−p+1 + c + ρyt−1 + εt . (4) There are tow meanings of a V AR process contains unit roots. First, if the first difference of yt follows a V AR(p − 1) process: 4yt = ξ14yt−1 + ξ24yt−2 + ... + ξp−1yt−p+1 + c + εt , requiring from (4) that ρ = Ik 3
or from(2)that Ik Second, recalling from(8)of Chapter 18 that a V AR() such as in(1) will be said to contain at least one unit root(2=1) if the following determinant is zero 更p Note that() implies(6) but(6)does not imply(5). Vector autoregression for which(6)holds but(5) does not will be considered in Chapter 23
or from (2) that Φ1 + Φ2 + ... + Φp = Ik. (5) Second, recalling from (8) of Chapter 18 that a V AR(p) such as in (1) will be said to contain at least one unit root (z = 1) if the following determinant is zero: |Ik − Φ1 − Φ2 − ... − Φp| = 0. (6) Note that (5) implies (6) but (6) does not imply (5). Vector autoregression for which (6) holds but (5) does not will be considered in Chapter 23. 4
3 Spurious Regression 3.1 Asymptotics for Spurious Regression Consider a regression of the form yt=x6+ut, for which elements of yt and xt might be nonstationary. If there does not exist some population value for B for which the disturbance ut=yt-x'tB is I(0), then OLS is quite likely to produce spurious results. In a extreme condition that Yt and xt are independent random walks, as we shall see, the OLS estimator of B, B is not consistent for B=0 but instead converge to a particular random variable. Because there is truly no relation between Yt and xt, and because Br is incapable of revealing this, we call this a case of"spurious regression". Thi phenomenon was first considered by Yule(1926), and the dangers of spurious re- gression were forcefully brought to the economists by the Monte Carlo studies of Granger and Newbold(1974)and latter explained theoretically by Phillips(1986) Theorem 3(Spurious Regression, two independent random walks) Let Xt and Yt be independent random walks, Xt=Xt-I+nt and Yt=Yt-1+St and nt is independent of zeta. Consider the regression equation for Yt in terms of X,, formally as Yt= XtB+ut, where B=0 and ut= Yt, reflecting the lack of any relations between Yt and Xt. Then the OLS estimator of 6, Br (oa/o1 wi()dr Jowi(r)Wa(r)dr, where o?=E(ne)and o2=E(S) To proceed, we write Wr(-1)=m∑m/01=T12x-1/ Wxr(r1-1)=T-1∑s/02=m-1Y-1/2 T-1P2Xt-1=01W1r(Tt-1
3 Spurious Regression 3.1 Asymptotics for Spurious Regression Consider a regression of the form yt = x 0 tβ + ut , (7) for which elements of yt and xt might be nonstationary. If there does not exist some population value for β for which the disturbance ut = yt −x 0 tβ is I(0), then OLS is quite likely to produce spurious results. In a extreme condition that Yt and xt are independent random walks, as we shall see, the OLS estimator of β, βˆ T is not consistent for β = 0 but instead converge to a particular random variable. Because there is truly no relation between Yt and xt , and because βˆ T is incapable of revealing this, we call this a case of ”spurious regression”. This phenomenon was first considered by Yule (1926), and the dangers of spurious regression were forcefully brought to the economists by the Monte Carlo studies of Granger and Newbold (1974) and latter explained theoretically by Phillips (1986). Theorem 3 (Spurious Regression, two independent random walks): Let Xt and Yt be independent random walks, Xt = Xt−1 + ηt and Yt = Yt−1 + ζt , and ηt is independent of zetat . Consider the regression equation for Yt in terms of Xt , formally as Yt = Xtβ + ut , where β = 0 and ut = Yt , reflecting the lack of any relations between Yt and Xt . Then the OLS estimator of β, βˆ T L−→ (σ2/σ1) hR 1 0 W1(r) 2dri−1 R 1 0 W1(r)W2(r)dr, where σ 2 1 = E(η 2 t ) and σ 2 2 = E(ζ 2 t ). Proof: To proceed, we write W1T (rt−1) = T −1/2X t−1 s=1 ηs/σ1 = T −1/2Xt−1/σ1, W2T (rt−1) = T −1/2X t−1 s=1 ζs/σ2 = T −1/2Yt−1/σ2 or T −1/2Xt−1 = σ1W1T (rt−1) (8) 5
and T-1/2Y-1=o2W2r(r1-1) (9) where a?≡ir-xVan(-12∑1n)anda2≡ limr_oo var(r-2∑1s) and rt-1=(t-1)/T as before From Donsker's theorem and the continuous mapping theorem we have that T-2∑1X21→0w) dr and also T-2∑121→03/W2(r)dr.The multivariate version of donsker's theorem states that 0 T-1/Xr(r) W1( owE(r) oIn (8)and(9)we ha T-1.T-1)X2-1Y 1=T-1 a1W1r(72-1)o2W2r(r-1) t=1 o102T∑Wr(r-1)Wx(-) t=1 ∑/Wnr()wnm() t=1J(t-1)/T 0102/WiT(r)W2r(r)dr 0102/W1(r)W2(r)b where we have use the fact that Wir(r) and W2r(r)is constant for(t-1)/T< r<t/T and the continuous mapping theorem to the mapping (x,y)→/r(ay(ada
and T −1/2Yt−1 = σ2W2T (rt−1), (9) where σ 2 1 ≡ limT→∞ V ar(T −1/2 PT t=1 ηt) and σ 2 2 ≡ limT→∞ V ar(T −1/2 PT t=1 ζt), and rt−1 = (t − 1)/T as before. From Donsker’s theorem and the continuous mapping theorem we have that T −2 PT t=1 X2 t−1 ⇒ σ 2 1 R 1 0 W1(r)dr and also T −2 PT t=1 Y 2 t−1 ⇒ σ 2 2 R 1 0 W2(r)dr. The multivariate version of Donsker’s theorem states that σ 2 1 0 0 σ 2 2 −1/2 T −1/2 [Tr] X∗ t=1 ηt ζ ⇒ W1(r) W2(r) or T −1/2XT (r) T −1/2YT (r) ⇒ σ1W1(r) σ2W2(r) . From (8) and (9) we have T −1 · T −1X T t=1 Xt−1Yt−1 = T −1X T t=1 σ1W1T (rt−1)σ2W2T (rt−1) = σ1σ2T −1X T t=1 W1T (rt−1)W2T (rt−1) = σ1σ2 X T t=1 Z t/T (t−1)/T W1T (r)W2T (r)dr = σ1σ2 Z 1 0 W1T (r)W2T (r)dr ⇒ σ1σ2 Z 1 0 W1(r)W2(r)dr, where we have use the fact that W1T (r) and W2T (r) is constant for (t − 1)/T ≤ r < t/T and the continuous mapping theorem to the mapping (x, y) 7→ Z 1 0 x(a)y(a)da. 6
Hence for convenience treating Br_1 instead of Br we have t-1 X t-11t-1 t=1 t=1 of/Wi(r))0102/Wi(r)W2(r)dr (2/0n1)(/wi(r)dr W1()W2(r)d.QE,D.(10) The spurious regression problem become clear upon inspection of (10). The true value of the derivative of Yt with respect to Xt is zero because the errors generating X, and Yt series in the regression are independent. Yet Br fails to converge in probability to zero and instead has a non-degenerate distribution Using similar techniques, Phillips(1986)show that T-1/ta has a non-degenerate distribution, or in other words that the t-statistic for Br has a divergent distri- bution. Hence as T -o0, the probability of a significant t-value arising in regression such as(7)approach one, leading to spurious inference about the ex- istence of a relationship between Xt and y The spurious regression problem not only arise from independent random walks, it even appears among non-cointegrated generally I(1)process Theorem 4(Spurious Regression, not cointegrated I(1) process, Hamilton's Parametric Method) Consider an(kx 1) vector yt whose first difference is described by for Et an i.i. d vector with mean zero, variance E(EtE=n= PP, and finite fourth moment and where sys s-o is absolutely summable Let g=(k-1)and A=y(1)P. Partition yt as yt=(Yit, yat), and partition ∑11∑ AA
Hence for convenience treating βˆ T −1 instead of βˆ T we have βˆ T −1 = T −2X T t=1 X 2 t−1 !−1 T −2X T t=1 Xt−1Yt−1 ! = σ 2 1 Z 1 0 W1(r)dr−1 σ1σ2 Z 1 0 W1(r)W2(r)dr = (σ2/σ1) Z 1 0 W1(r)dr−1 Z 1 0 W1(r)W2(r)dr. Q.E.D. (10) The spurious regression problem become clear upon inspection of (10). The true value of the derivative of Yt with respect to Xt is zero because the errors generating Xt and Yt series in the regression are independent. Yet βˆ T fails to converge in probability to zero and instead has a non-degenerate distribution. Using similar techniques, Phillips (1986) show that T −1/2 tβˆ T has a non-degenerate distribution, or in other words that the t-statistic for βˆ T has a divergent distribution. Hence as T → ∞, the probability of a significant t-value arising in a regression such as (7) approach one, leading to spurious inference about the existence of a relationship between Xt and Yt . The spurious regression problem not only arise from independent random walks, it even appears among non-cointegrated generally I(1) process. Theorem 4 (Spurious Regression, not cointegrated I(1) process, Hamilton’s Parametric Method): Consider an (k × 1) vector yt whose first difference is described by (1 − L)yt = Ψ(L)εt = X∞ s=0 Ψsεt−s, for εt an i.i.d. vector with mean zero, variance E(εtε 0 t ) = Ω = PP0 , and finite fourth moment and where {s · Ψs} ∞ s=0 is absolutely summable. Let g = (k − 1) and Λ = Ψ(1)P. Partition yt as yt = (Y1t , y 0 2t ) 0 , and partition ΛΛ0 as ΛΛ0 = Σ11 Σ 0 21 Σ21 Σ22 , 7
where∑is(1×1)and∑22is(g×g) Suppose that aa is nonsingular, and define (a1)2≡(1-∑2∑2∑21) Let L22 denote the Cholesky factor of 222 and consider the consequence of an OLS regression of the first variable on the others and a constant Yit= ar +y? Br +it and ant null hypothesis of the form Ho: RB = q, where R is a known(r x g) matrix representing r separate hypothesis involving B and q is a known rx 1 vector. Then the following hold. (a). The OLS estimate ar and B are characterized by ath1 Bn-∑2 OLh fo lw2(r)]'dr ∫oW1(r)dr bw()bfhw)w)b」[w(H and Wi(r)denotes scalar standard Brownian motion, w2(r) denotes g-dimensional standard Brownian motion with w2(r) independent of Wi(r) (b). The sum of squared errors SSE from the OLS estimation of (11) satisfies T-2·SSE→(o1)2·H H≡/w()d Wi(r)dr (r)lw2(r),dr ∫w2()tr I wi(r)dr 5o wa(r)dr 5o(lw2(r)/ w2(r)'dr5ow2(r)Wi(r)dr
where Σ11 is (1 × 1) and Σ22 is (g × g). Suppose that ΛΛ0 is nonsingular, and define (σ ∗ 1 ) 2 ≡ (Σ11 − Σ 0 21Σ −1 22 Σ21). Let L22 denote the Cholesky factor of Σ −1 22 and consider the consequence of an OLS regression of the first variable on the others and a constant, Y1t = αˆT + y 0 2tβˆ T + uˆt , (11) and ant null hypothesis of the form H0 : Rβ = q, where R is a known (r × g) matrix representing r separate hypothesis involving β and q is a known r × 1 vector. Then the following hold. (a). The OLS estimate αˆT and βˆ T are characterized by T −1/2αˆT βˆ T − Σ −1 22 Σ21 L−→ σ ∗ 1h1 σ ∗ 1L22h22 , where h1 h2 ≡ " 1 R 1 0 [w2(r)]0dr R 1 0 w2(r)dr R 1 0 [w2(r)][w2(r)]0dr #−1 × " R 1 0 W1(r)dr R 1 0 w2(r)W1(r)dr # and W1(r) denotes scalar standard Brownian motion, w2(r) denotes g-dimensional standard Brownian motion with w2(r) independent of W1(r). (b). The sum of squared errors SSE from the OLS estimation of (11) satisfies T −2 · SSE L−→ (σ ∗ 1 ) 2 · H, where H ≡ Z 1 0 [W1(r)]2 dr − Z 1 0 W1(r)dr Z 1 0 [W1(r)][w2(r)]0 dr × " 1 R 1 0 [w2(r)]0dr R 1 0 w2(r)dr R 1 0 [w2(r)][w2(r)]0dr #−1 " R 1 0 W1(r)dr R 1 0 w2(r)W1(r)dr # . 8
(c). The OLS F test satisfies (aRh2-q xo*HOR]x ∫ow2(r)dr 0 5o wa(r)dr So (w2(r)( w2(r)drLR (aRh2-q”)÷T h R RL q≡q-R∑2∑ Result(a) implies that neither estimator is consistent. The estimator of the onstant, a actually diverge, and must divided by T1/2 to obtain a random vari- ible with a well-specified distribution. The estimator a itself is likely to get farther and farther from the true value of zero as the sample T increase. Thing does not get better when we look at B. Different arbitrary large sample will have randomly differing estimators B. Those usual happenings that B-0 and must multiplied by some increasing function of T in order to obtain a nondegenerate asymptotic distribution does not occur Result(b) implies that the usual OLS estimator of the variance of ut f=(T-k)Sser again diverge as T-o0. To obtain an estimator that does not grow with the sample size, the sums of squared errors has to be divided by T rather than T In this respect, the residual ut from a spurious regression behave like a unit root process; if St is a scalar I(1)series, then T-2s diverge and T-225 converges Result(c) means that any OLS t or F test based on the spurious regression also diverge: the OLS F statistics must be divided by t to obtains a variable that does not grow with the sample size. Since an F test of a single restriction the square of the corresponding t test, any t statistics would have to be divided by T1/2 to obtain a convergent variable. Thus, as the sample size become large
(c). The OLS F test satisfies T −1FT L−→ (σ ∗ 1R ∗h2 − q ∗ ) 0 × σ ∗ 1H[0 R∗ ] × " 1 R 1 0 [w2(r)]0dr R 1 0 w2(r)dr R 1 0 [w2(r)][w2(r)]0dr #−1 0 0 R0∗ ×(σ ∗ 1R∗h2 − q ∗ ) ÷ r, where R∗ ≡ RL22 q ∗ ≡ q − RΣ−1 22 Σ21. Result (a) implies that neither estimator is consistent. The estimator of the constant, αˆ actually diverge, and must divided by T 1/2 to obtain a random variable with a well-specified distribution. The estimator αˆ itself is likely to get farther and farther from the true value of zero as the sample T increase. Thing does not get better when we look at βˆ. Different arbitrary large sample will have randomly differing estimators βˆ. Those usual happenings that βˆ p −→ 0 and must multiplied by some increasing function of T in order to obtain a nondegenerate asymptotic distribution does not occur. Result (b) implies that the usual OLS estimator of the variance of ut s 2 T = (T − k) −1SSET , again diverge as T → ∞. To obtain an estimator that does not grow with the sample size, the sums of squared errors has to be divided by T 2 rather than T. In this respect, the residual uˆt from a spurious regression behave like a unit root process; if ξt is a scalar I(1) series, then T −1 Pξ 2 t diverge and T −2 Pξ 2 t converges. Result (c) means that any OLS t or F test based on the spurious regression also diverge; the OLS F statistics must be divided by T to obtains a variable that does not grow with the sample size. Since an F test of a single restriction is the square of the corresponding t test, any t statistics would have to be divided by T 1/2 to obtain a convergent variable. Thus, as the sample size become large, 9
it becomes increasingly that the absolute value of an Ols t test will exceed any arbitrary finite value(such as the usual critical value of t= 2). For example, in the regression of (11), it appears that Yit and y2t are significantly related whereas in reality they are completely independent Should we be totally pessimistic on the regression of unit root process from above results? There is, in fact, one case of major importance where the corre- ation properties of Yit and y2t do interfere with these qualitative results. The conditions in this Theorem require that AA is nonsingular. From the fact that rank(AA)=rank(A),A=y(1)P, and P is nonsingular we require that y (1) is nonsingular or that the determinant y (1)+0. If we allow y (1)to be singu- lar, then the asymptotic theory of this theorem no longer holds as stated. The condition that y (1) is singular is a necessary conditions for Yit and y2t to be cointegrated in the sense of Engle and Granger(1987). See Chapter 23 for de- tail 3.2 Cures For Spurious Regression Many researchers recommend routinely differencing apparently nonstationary variables before estimating regression(for example, Gordon(1984) +△y2b+ which is believe to avoid the spurious regression problem as well as the nonsan- dard distributions for certain hypotheses associated with the levels regression (11). While this is the ideal cure for the problem discussed in this section, there are two different situations in which it might be inappropriate First, if a economic theory specify a linear relation between Yit and y2t in level as in(11), then the parameters has its own economical interpretation, for example, aCt/aYt=B is the marginal propensity to consume which must be positive under normal condition. However, a regression in differenced data, the parameters has different economic implication, e.g. aACt/aAYt=b, which may be positive or negative even though aCt/aYt=6 must be positive. Thus, differencing the data
it becomes increasingly that the absolute value of an OLS t test will exceed any arbitrary finite value (such as the usual critical value of t = 2). For example, in the regression of (11), it appears that Y1t and y2t are significantly related whereas in reality they are completely independent. Should we be totally pessimistic on the regression of unit root process from above results ? There is, in fact, one case of major importance where the correlation properties of Y1t and y2t do interfere with these qualitative results. The conditions in this Theorem require that ΛΛ0 is nonsingular. From the fact that rank (ΛΛ0 ) =rank (Λ), Λ = Ψ(1)P, and P is nonsingular we require that Ψ(1) is nonsingular or that the determinant |Ψ(1)| 6= 0. If we allow Ψ(1) to be singular, then the asymptotic theory of this theorem no longer holds as stated. The condition that Ψ(1) is singular is a necessary conditions for Y1t and y2t to be cointegrated in the sense of Engle and Granger (1987). See Chapter 23 for details. 3.2 Cures For Spurious Regression Many researchers recommend routinely differencing apparently nonstationary variables before estimating regression (for example, Gordon (1984)): 4Y1t = a + 4y 0 2tb + vt , which is believe to avoid the spurious regression problem as well as the nonstandard distributions for certain hypotheses associated with the levels regression (11). While this is the ideal cure for the problem discussed in this section, there are two different situations in which it might be inappropriate. First, if a economic theory specify a linear relation between Y1t and y2t in level as in (11), then the parameters has its own economical interpretation, for example, ∂Ct/∂Yt = β is the marginal propensity to consume which must be positive under normal condition. However, a regression in differenced data, the parameters has different economic implication, e.g. ∂4Ct/∂4Yt = b, which may be positive or negative even though ∂Ct/∂Yt = β must be positive. Thus, differenceing the data 10