Ch. 15 Forecasting Having considered in Chapter 14 some of the properties of ARMA models, we now show how they may be used to forecast future values of an observed time series. For the present we proceed as if the model were known ecactly Forecasting is an important concept for the studies of time series analysis. In the scope of regression model we usually has an existing economic theory model for us to estimate their parameters. The estimated coefficients have already a role to play such as to confirm some economic theories. Therefore, to forecast or not from this estimated model depends on researchers own interest. However the estimated coefficients from a time series model have no significant meaning to economic theory. An important role that a time series analysis is therefore to be able to forecast precisely from this pure mechanical model 1 Principle of Forecasting 1. 1 For s Based on Conditional Expectations Suppose we are interested in forecasting the value of a variables Yi+1 based on a set of variables xt observed at date t. For example, we might want to forecast Yi+1 based on its m most recent values. In this case, xt=Yt, Yt-1,.,Yt-m+1l Let Yi+llt denote a forecast of Yi+1 based on xt(a function of xt, depending on how they are realized). To evaluate the usefulness of this forecast, we need to specify a loss function. a quadratic loss function means choosing the forecast t+1lt so as to minimize MSE(Y(+u)=E(Y+1-Y1+) which is known as the mean squared error Theorem The smallest mean squared error of in the forecast Y*ul is the expectation of Yt+1 conditional on x t+1|=E(Yt+1x)
Ch. 15 Forecasting Having considered in Chapter 14 some of the properties of ARMA models, we now show how they may be used to forecast future values of an observed time series. For the present we proceed as if the model were known exactly. Forecasting is an important concept for the studies of time series analysis. In the scope of regression model we usually has an existing economic theory model for us to estimate their parameters. The estimated coefficients have already a role to play such as to confirm some economic theories. Therefore, to forecast or not from this estimated model depends on researcher’s own interest. However, the estimated coefficients from a time series model have no significant meaning to economic theory. An important role that a time series analysis is therefore to be able to forecast precisely from this pure mechanical model. 1 Principle of Forecasting 1.1 Forecasts Based on Conditional Expectations Suppose we are interested in forecasting the value of a variables Yt+1 based on a set of variables xt observed at date t. For example, we might want to forecast Yt+1 based on its m most recent values. In this case, xt = [Yt , Yt−1, ...., Yt−m+1] 0 . Let Y ∗ t+1|t denote a forecast of Yt+1 based on xt (a function of xt , depending on how they are realized). To evaluate the usefulness of this forecast, we need to specify a loss function. A quadratic loss function means choosing the forecast Y ∗ t+1|t so as to minimize MSE(Y ∗ t+1|t ) = E(Yt+1 − Y ∗ t+1|t ) 2 , which is known as the mean squared error. Theorem: The smallest mean squared error of in the forecast Y ∗ t+1|t is the expectation of Yt+1 conditional on xt : Y ∗ t+1|t = E(Yt+1|xt). 1
Let g(xt) be a forecasting function of Yi+1 other then the conditional expectation E(Y+1lx). Then the MSE associated with g(xt) would be 1Y+1-9(x)2=EY+1-E(Yi+1x)+E(Y+l|x E(Y+1|xt)2 +2E{[Yt+1-E(Yt+1x)E(Yt+1x)-9(x +E{E[(Yt+1|x)-9(x)]2} Denote mu+1= EY++1-E(Yi+lxuIE(Y++1xt)-g(xt)I we have E(mh+1|xt)=[E(Yt+|x)-9(x×E(Yt+1-E(Y+1|x川)x) [E(Yt+1xt)-9(x) By laws of iterated expectation, it follows that E(m+1)=Ex,E(E[n+1xd])=0. Therefore we have EY+1-9(x)2=E+1-E(Y+1|x)]2+E{E[(Y+|x)-9(x)2}(1) The second term on the right hand side of (1)cannot be made smaller than zero and the first term does not depend on g(xt). The function g(xt) that can makes the mean square error(1) as small as possible is the function that sets the second term in(1)to zero xt)=e(i+ilx) The mse of this optimal forecast is 9(x)2=EY+1-E(Yt+1|x)2
Proof: Let g(xt) be a forecasting function of Yt+1 other then the conditional expectation E(Yt+1|xt). Then the MSE associated with g(xt) would be E[Yt+1 − g(xt)]2 = E[Yt+1 − E(Yt+1|xt) + E(Yt+1|xt) − g(xt)]2 = E[Yt+1 − E(Yt+1|xt)]2 +2E{[Yt+1 − E(Yt+1|xt)][E(Yt+1|xt) − g(xt)]} +E{E[(Yt+1|xt) − g(xt)]2 }. Denote ηt+1 ≡ E{[Yt+1 − E(Yt+1|xt)][E(Yt+1|xt) − g(xt)]} we have E(ηt+1|xt) = [E(Yt+1|xt) − g(xt)] × E([Yt+1 − E(Yt+1|xt)]|xt) = [E(Yt+1|xt) − g(xt)] × 0 = 0. By laws of iterated expectation, it follows that E(ηt+1) = ExtE(E[ηt+1|xt ]) = 0. Therefore we have E[Yt+1 − g(xt)]2 = E[Yt+1 − E(Yt+1|xt)]2 + E{E[(Yt+1|xt) − g(xt)]2 }. (1) The second term on the right hand side of (1) cannot be made smaller than zero and the first term does not depend on g(xt). The function g(xt) that can makes the mean square error (1) as small as possible is the function that sets the second term in (1) to zero: g(xt) = E(Yt+1|xt). The MSE of this optimal forecast is E[Yt+1 − g(xt)]2 = E[Yt+1 − E(Yt+1|xt)]2 . 2
1.2 Forecasts Based on Linear Projection Suppose we now consider only the class of forecast that Yt+1 is a linear function Definition The forecast axt is called the linear projection of Yt+1 on xt if the forecast error (Yt+1-a'xt) is uncorrelated with xt E[(Y+1-ax)xl]=0 Theo The linear projection produces the smallest mean squared error among the class of linear forecasting rule Proof Let gx be any arbitrary linear forecasting function of Yt+1. Then the mse associated with gx would be E[Y++1-g'x1=ElY++1-a'xt+a'xt-g'x EYt x -2EY++1-a'xia'xt-g'xI Ela Denote nt+1= EY+1-a'xtlla'xt-g'x we have E(++1)= ElY xia-gxtI (E[Y++1-a'xi xla-g Therefore we have E(Yu+1-g'x 2=EY+1-a'xt)12+Elaxt-gx 12 The second term on the right hand side of (3) cannot be made smaller than zero and the first term does not depend on gx. The function gx t that can makes
1.2 Forecasts Based on Linear Projection Suppose we now consider only the class of forecast that Yt+1 is a linear function of xt : Y ∗ t+1|t = α 0xt . Definition: The forecast α0xt is called the linear projection of Yt+1 on xt if the forecast error (Yt+1 − α0xt) is uncorrelated with xt : E[(Yt+1 − α 0xt)x 0 t ] = 0 0 . (2) Theorem: The linear projection produces the smallest mean squared error among the class of linear forecasting rule. Proof: Let g 0xt be any arbitrary linear forecasting function of Yt+1. Then the MSE associated with g 0xt would be E[Yt+1 − g 0xt ] 2 = E[Yt+1 − α 0xt + α 0xt − g 0xt ] 2 = E[Yt+1 − α 0xt ] 2 +2E{[Yt+1 − α 0xt ][α 0xt − g 0xt ]} +E[α 0xt − g 0xt ] 2 . Denote ηt+1 ≡ E{[Yt+1 − α0xt ][α0xt − g 0xt ]} we have E(ηt+1) = E{[Yt+1 − α 0xt ][α 0 − g 0 ]xt} = (E[Yt+1 − α 0xt ]x 0 t )[α − g] = 0 0 [α − g] = 0 0 . Therefore we have E[Yt+1 − g 0xt ] 2 = E[Yt+1 − α 0xt)]2 + E[α 0xt − g 0xt ] 2 . (3) The second term on the right hand side of (3) cannot be made smaller than zero and the first term does not depend on g 0xt . The function g 0xt that can makes 3
the mean square error 3) as small as possible is the function that sets the second term in (3)to zero gx,=ax, The MSE of this optimal forecast is EY g x E a X For axt is a linear projection of Yt+1 on xt, we will use the notation P(Y++1xt) to indicate the linear projection of Yt+1 on xt. Notice that MSEIP(Y++1x I> MSEE(Y++1xt) since the conditional expectation offers the best possible forecast For most applications a constant term will be included in the projection. We will use the symbol e to indicate a linear projection on a vector of random variables xt along a constant term E(Y++1lx=P(Yt+lll, x)
the mean square error (3) as small as possible is the function that sets the second term in (3) to zero: g 0xt = α 0xt . The MSE of this optimal forecast is E[Yt+1 − g 0xt ] 2 = E[Yt+1 − α 0xt ] 2 . For α0xt is a linear projection of Yt+1 on xt , we will use the notation Pˆ(Yt+1|xt) = α 0xt , to indicate the linear projection of Yt+1 on xt . Notice that MSE[Pˆ(Yt+1|xt)] ≥ MSE[E(Yt+1|xt)], since the conditional expectation offers the best possible forecast. For most applications a constant term will be included in the projection. We will use the symbol Eˆ to indicate a linear projection on a vector of random variables xt along a constant term: Eˆ(Yt+1|xt) ≡ Pˆ(Yt+1|1, xt ). 4
2 Forecasts based on an infinite Number of ob servation Recall that a general stationary and invertible ARMA(p, g) process is written in O(L(Y-u)=a(LEt where O(L)=1-o1L-o2L Pp LP, 6(L)=1+01L+82L2+.+0qL9 and all the roots of (L)=0 and 0(L)=0 lie outside the unit 2.1 Forecasting Based on Lagged E's, MA(o)form Consider an MA(o) form of (4) PP(L with Et white noise and D)=B)o-(L)=∑yD Suppose that we have an infinite number of observations on e through date t that is Et, Et-1, Et-2,, and further know the value of u and (+1, 2, . Say we want to forecast the value of Yi+s from now. Note that(5) implies u+Et+s + p1 +…+ys-1Et+1+y The best linear forecast takes the form EC
2 Forecasts Based on an Infinite Number of Observation Recall that a general stationary and invertible ARMA(p, q) process is written in this form: φ(L)(Yt − µ) = θ(L)εt , (4) where φ(L) = 1 − φ1L − φ2L 2 − ... − φpL p , θ(L) = 1 + θ1L + θ2L 2 + ... + θqL q and all the roots of φ(L) = 0 and θ(L) = 0 lie outside the unit circle. 2.1 Forecasting Based on Lagged ε 0 s, MA(∞) form Consider an MA(∞) form of (4): Yt − µ = ϕ(L)εt (5) with εt white noise and ϕ(L) = θ(L)φ −1 (L) = X∞ j=0 ϕjL j , ϕ0 = 1, X∞ j=0 |ϕj | < ∞. Suppose that we have an infinite number of observations on ε through date t, that is {εt , εt−1, εt−2, ....}, and further know the value of µ and {ϕ1, ϕ2, ...}. Say we want to forecast the value of Yt+s from now. Note that (5) implies Yt+s = µ + εt+s + ϕ1εt+s−1 + ... + ϕs−1εt+1 + ϕsεt +ϕs+1εt−1 + .... The best linear forecast takes the form Eˆ(Yt+s|εt , εt−1, ...) = µ + ϕsεt + ϕs+1εt−1 + .... (6) = [µ, ϕs, ϕs+1, ...][1, εt , εt−1, ...] 0 (7) = α 0xt . (8) 5
The error associated with this forecast is uncorrelated with xt=[1, Et, Et-1,." E Ext. Yi+s-E(Yi+slEt, Et-1,.1=E t+s +Ps-1Et+1) The mean squared error associated with this forecast E[Y+s-E(Y+lt,E-1.)2=(1+2+y2+…+y2-1)a2 Example For an MA(g) process, the optimal linear forecast is E(Y+2,1-1,…小 H+6,2t+6+1=-1+…+6Et-q+fors=1,2,…,q for s= 9+1, 9+2 The Mse is (1+62+2 02-1)a2fo (1+6++…+6)2fors=q+1,q+2 The MSE increase with the forecast horizon s up until s= g. If we try to forecast an MA(g) farther than q periods into the future, the forecast is simply the unconditional mean of the series(E(Y+s)=u) and the MSe is the uncondi- tional variance of the series(Var(Yt+s=(+61+02+.+8402) A compact lag operator expression for the forecast in(6) is sometimes used Rewrite Yis as in(5)as +p(L)E A+P(LL-SEt
The error associated with this forecast is uncorrelated with xt = [1, εt , εt−1, ...] 0 , or E{xt · [Yt+s − Eˆ(Yt+s|εt , εt−1, ...)]} = E 1 εt εt−1 . . . (εt+s + ϕ1εt+s−1 + ... + ϕs−1εt+1) = 0. The mean squared error associated with this forecast is E[Yt+s − Eˆ(Yt+s|εt , εt−1, ...)]2 = (1 + ϕ 2 1 + ϕ 2 2 + ... + ϕ 2 s−1 )σ 2 . Example: For an MA(q) process, the optimal linear forecast is Eˆ(Yt+s|εt , εt−1, ...] = µ + θsεt + θs+1εt−1 + ... + θqεt−q+s for s = 1, 2, ..., q µ for s = q + 1, q + 2, ... The MSE is σ 2 for s = 1 (1 + θ 2 1 + θ 2 2 + ... + θ 2 s−1 )σ 2 for s = 2, 3, ..., q ((1 + θ 2 1 + θ 2 2 + ... + θ 2 q )σ 2 for s = q + 1, q + 2, .... The MSE increase with the forecast horizon s up until s = q. If we try to forecast an MA(q) farther than q periods into the future, the forecast is simply the unconditional mean of the series (E(Yt+s) = µ) and the MSE is the unconditional variance of the series (V ar(Yt+s) = (1 + θ 2 1 + θ 2 2 + ... + θ 2 q )σ 2 ). A compact lag operator expression for the forecast in (6) is sometimes used. Rewrite Yt+s as in (5) as Yt+s = µ + ϕ(L)εt+s = µ + ϕ(L)L −s εt . 6
Consider polynomial that p(l) are divided by LS L+P1L+2L+.+Ps-1L+PsL The annihilation operator replace negative powers of L by zero; for example 。L0+y+1L1+9+22+ Therefore the optimal forecast(6) could be written in lag operator notation E(Y+4=,-1,-=+/( Et 2.2 Forecasting based on lagged ys The previous forecasts were based on the assumption that Et is observed directly In the usual forecasting situation, we actually have observation on lagged ys,not lagged e's. Suppose that the general ARMA(p, q) has an AR(oo) representation n(L(i-F)=Et (10) with et white noise and n(D)=0-(L)(L)=∑n=9-1(L Under these conditions, we can substitute(10)into( 9) to obtain the forecast of Yt. as a function of lagged y's E(Y+AY1,Y1,4=+/(L L n(l(t-F (11)
Consider polynomial that ϕ(L) are divided by L s : ϕ(L) Ls = L −s + ϕ1L 1−s + ϕ2L 2−s + ... + ϕs−1L −1 + ϕsL 0 +ϕs+1L 1 + ϕs+2L 2 + .... The annihilation operator replace negative powers of L by zero; for example, ϕ(L) Ls + = ϕsL 0 + ϕs+1L 1 + ϕs+2L 2 + .... Therefore the optimal forecast (6) could be written in lag operator notation as Eˆ(Yt+s|εt , εt−1, ...] = µ + ϕ(L) Ls + εt . (9) 2.2 Forecasting Based on Lagged Y 0 s The previous forecasts were based on the assumption that εt is observed directly. In the usual forecasting situation, we actually have observation on lagged Y 0 s, not lagged ε 0 s. Suppose that the general ARMA(p, q) has an AR(∞) representation given by η(L)(Yt − µ) = εt (10) with εt white noise and η(L) = θ −1 (L)φ(L) = X∞ j=0 ηjL j = ϕ −1 (L), η0 = 1, X∞ j=0 |ηj | < ∞. Under these conditions, we can substitute (10) into (9) to obtain the forecast of Yts as a function of lagged Y 0 s: Eˆ(Yt+s|Yt , Yt−1, ...] = µ + ϕ(L) Ls + η(L)(Yt − µ) (11) 7
E(Yi+sYt P(L) t-1; + Ls. o((Y-H Equation(12)is known as the Wiener- Kolmogorov prediction formula 2.2.1 Forecasting an AR(1)Process 1. Use Wiener- Kolmogorov prediction formula For the covariance stationary AR(1)process, we have (L) 1+L+02L2+33+ and g?(L) °+φ+1L1+o+2L2+ L OL The optimal linear s-period ahead forecast for a stationary AR(1) process is therefore E(Yt+Y1,Y1=1,…]=p (1-oD)( 1+°(Y-1) The forecast decays geometrically from(Yt-H) toward u as the forecast hori- zon s increase 2. Use Recursive substitution and Lag operator The AR(1) process can be represented as(using(1.1.9)on p 3 of Hamilton) Yi+8-=0(1-p)+-et+1+-2e+2+….+t+-1+e+s, Setting E(Et+h)=0, h=1, 2, ...,s, the optimal linear s-period ahead forecast for a stationary AR(1)process is therefore E(Y+sYt +°(Yt-)
or Eˆ(Yt+s|Yt , Yt−1, ...] = µ + ϕ(L) Ls + 1 ϕ(L) (Yt − µ). (12) Equation (12) is known as the Wiener − Kolmogorov prediction formula. 2.2.1 Forecasting an AR(1) Process 1. Use Wiener − Kolmogorov prediction formula: For the covariance stationary AR(1) process, we have ϕ(L) = 1 1 − φL = 1 + φL + φ 2L 2 + φ 3L 3 + ... and ϕ(L) Ls + = φ s + φ s+1L 1 + φ s+2L 2 + ... = φ s 1 − φL. The optimal linear s-period ahead forecast for a stationary AR(1) process is therefore: Eˆ(Yt+s|Yt , Yt−1, ...] = µ + φ s 1 − φL(1 − φL)(Yt − µ) = µ + φ s (Yt − µ). The forecast decays geometrically from (Yt −µ) toward µ as the forecast horizon s increase. 2. Use Recursive substitution and Lag operator: The AR(1) process can be represented as (using (1.1.9) on p.3 of Hamilton) Yt+s − µ = φ s (Yt − µ) + φ s−1 εt+1 + φ s−2 εt+2 + ... + φεt+s−1 + εt+s, Setting E(εt+h) = 0, h = 1, 2, ..., s, the optimal linear s-period ahead forecast for a stationary AR(1) process is therefore: Eˆ(Yt+s|Yt , Yt−1, ...] = µ + φ s (Yt − µ), 8
which has MSe of forecast to be E(o-1 Et+2+…+E Notice that this grows with s and asymptotically approach o/(1-2), the un- conditional variance of y 2.2.2 Forecasting an AR(p) Process 1. Use Recursive substitution and Lag operator Following(11) in Chapter 13, the value of Y at t+ s of an AR(p) process can be represented as Yt+-H=f1(Y-)+f1(Y-1-p)+…+f12(Y-p+1-1)+f12et+1+fi12et+2 +Et+8-1+t+8, where fi is the(1, 1)elements of F3 q123 100 00 010 F≡ 000 10 Setting E(Et+h)=0, h=1, 2, ...,s, the optimal linear s-period ahead forecast for a stationary AR(p) process is therefore =p+f1(Y-p)+f2(Y-1-1)+…+f1p(Y-p+1-p) The associated forecast error is Yi+s-e(Yi+s)=fi Et+1+ fi Et f It is important to note that to forecast an AR(p) process, an optimal s-perioc ahead linear forecast based on an infinite number of observations Yt, Yt-1, .in fact make use of only the p most recent value Y, Yt-1,,Y-p+1l
which has MSE of forecast to be E(φ s−1 εt+1 + φ s−2 εt+2 + ... + φεt+s−1 + εt+s) 2 = (1 + φ 2 + φ 4 + ... + φ 2(s−1))σ 2 . Notice that this grows with s and asymptotically approach σ 2/(1 − φ 2 ), the unconditional variance of Y . 2.2.2 Forecasting an AR(p) Process 1. Use Recursive substitution and Lag operator: Following (11) in Chapter 13, the value of Y at t + s of an AR(p) process can be represented as Yt+s − µ = f s 11(Yt − µ) + f s 12(Yt−1 − µ) + ... + f s 1p (Yt−p+1 − µ) + f s−1 11 εt+1 + f s−2 11 εt+2 + ... +f 1 11εt+s−1 + εt+s, where f j 11 is the (1, 1) elements of F j , F ≡ φ1 φ2 φ3 . . φp−1 φp 1 0 0 . . 0 0 0 1 0 . . 0 0 . . . . . . . . . . . . . . . . . . . . . 0 0 0 . . 1 0 . Setting E(εt+h) = 0, h = 1, 2, ..., s, the optimal linear s-period ahead forecast for a stationary AR(p) process is therefore: Eˆ(Yt+s|Yt , Yt−1, ...] = µ + f s 11(Yt − µ) + f s 12(Yt−1 − µ) + ... + f s 1p (Yt−p+1 − µ). The associated forecast error is Yt+s − Eˆ(Yt+s) = f s−1 11 εt+1 + f s−2 11 εt+2 + ... + f 1 11εt+s−1 + εt+s. It is important to note that to forecast an AR(p) process, an optimal s-periodahead linear forecast based on an infinite number of observations {Yt , Yt−1, ...} in fact make use of only the p most recent value {Yt , Yt−1, .., Yt−p+1}. 9
2.2. 3 Forecasting an MA(1)Process An invertible MA(1) process 4=(1+bD)et with 0 <1 1. Applying the Wiener-Kolmogorov formula we have t+=H×/1+L 1+L To forecast an MA(1)process one period ahead(s= 1) 1+6L and so H+1+6L 1+(Y1-1)-02(Y-1-1)+6(Y1-2-p) To forecast an MA(1)process for s=2, 3, . periods into the future 1+BL o fa 2,3, an so t+st=u for s=2, 3 2. From recursive substitution An MA(1) process at period t+ 1 is Yt+1-A=Et+1+lEt At period t, E(Et+s)=0, s= 1, 2, .. The optimal linear 1-period ahead forecast for a stationary MA(1)process is therefore +0e 4+6(1+L)-1(Y-p) 1+(Y2-1)-02(Y1-1-1)+0(Y1-2-1)
2.2.3 Forecasting an MA(1) Process An invertible MA(1) process: Yt − µ = (1 + θL)εt with |θ| < 1. 1. Applying the Wiener-Kolmogorov formula we have Yˆ t+s|t = µ + 1 + θL Ls + 1 1 + θL (Yt − µ), To forecast an MA(1) process one period ahead (s = 1), 1 + θL L1 + = θ, and so Yˆ t+s|t = µ + θ 1 + θL (Yt − µ) = µ + θ(Yt − µ) − θ 2 (Yt−1 − µ) + θ 3 (Yt−2 − µ) − .... To forecast an MA(1) process for s = 2, 3, ... periods into the future, 1 + θL Ls + = 0 for s = 2, 3, ...; an so Yˆ t+s|t = µ for s = 2, 3, .... 2. From recursive substitution: An MA(1) process at period t + 1 is Yt+1 − µ = εt+1 + θεt . At period t, E(εt+s) = 0, s = 1, 2, .... The optimal linear 1-period ahead forecast for a stationary MA(1) process is therefore: Yˆ t+1|t = µ + θεt = µ + θ(1 + θL) −1 (Yt − µ) = µ + θ(Yt − µ) − θ 2 (Yt−1 − µ) + θ 3 (Yt−2 − µ) − .... 10