Ch. 16 Stochastic Model Building Unlike linear regression model which usually has an economic theoretic model built somewhere in economic literature, the time series analysis of a stochastic process needs the ability to relating a stationary ARMA model to real data. It is usually best achieved by a three-stage iterative procedure based on identi fication estimation, and diagnostic checking as suggested by Box and Jenkins(1976) 1 Model identification By identification we mean the use of the data, and of any information on how the series was generated, to suggest a subclass of parsimonious model worthy to be entertained. We usually transform the data, if necessary, so the assumption of covariance stationarity is a reasonable one. We then at this stage make an initial guess of small values of p and q for an ARM A(p, q) model that might describe the transformed data 1.1 Identifying the degree of Difference Trend stationary or difference stationary See Ch. 19 1.2 Use of the autocorrelation and partial autocorrela tion function in identification 1.2.1 Autocorrelation Recall that if the data really follow an MA(q) process, then its(population) autocorrelation Ti(1i/7o)will be zero for j>q. By contrast, if the data follow an AR(p)process, then r, will gradually decay toward zero as a mixture ial or damped sinusoids. On guide for distinguishing MA and AR representation, then, would be the decay properties of ri. It is useful to have a rough check on whether ri is effectively zero beyond a certain lag
Ch. 16 Stochastic Model Building Unlike linear regression model which usually has an economic theoretic model built somewhere in economic literature, the time series analysis of a stochastic process needs the ability to relating a stationary ARMA model to real data. It is usually best achieved by a three-stage iterative procedure based on identification, estimation, and diagnostic checking as suggested by Box and Jenkins (1976). 1 Model Identification By identification we mean the use of the data, and of any information on how the series was generated, to suggest a subclass of parsimonious model worthy to be entertained. We usually transform the data, if necessary, so the assumption of covariance stationarity is a reasonable one. We then at this stage make an initial guess of small values of p and q for an ARMA(p, q) model that might describe the transformed data. 1.1 Identifying the degree of Difference Trend stationary or difference stationary ? See Ch. 19. 1.2 Use of the Autocorrelation and Partial Autocorrelation Function in Identification 1.2.1 Autocorrelation Recall that if the data really follow an MA(q) process, then its (population) autocorrelation rj(= γj/γ0) will be zero for j > q. By contrast, if the data follow an AR(p) process, then rj will gradually decay toward zero as a mixture of exponential or damped sinusoids. On guide for distinguishing MA and AR representation, then, would be the decay properties of rj . It is useful to have a rough check on whether rj is effectively zero beyond a certain lag. 1
A natural estimate of the population autocorrelation r, is provided by the corresponding sample moment: (remember at this stage, you still have no"model to estimate, so it is natural to use moment estimator where ∑(Y1-F)(Y--F)forj=0,1,2, If the data were really generated by a Gauss MA(g process, then the covariance of the estimated autocorrelation ii, could be approximated by(see Box et al (1994),p.33) var()271+2∑+}oj=q+1yq+2 To use(1)in practice, the estimated autocorrelation F, G= 1, 2,,q)are ubstituted for the theoretical autocorrelation ri, and when this is done we shall refer to the square root of (1)as the large - lag standard error. In particu- ar, if we suspect that the data were generated by Gaussian white noise, then ~N(0,1/T)forj≠0, that is r, should lie between±2/√ T about95% of the time The following estimated autocorrelations were obtained from a time series of length T= 200 observations, generated from a stochastic process for which it was known that r1=-04 and r,=0 for j> 2 1=-0.38,72=-0.08,3=0.11,4=-0.08,5=0.02,6=0.00,77=0.00 On the assumption that the series is complete random: Ho: 9=0, then for all lags, (1) (1) 0.005
A natural estimate of the population autocorrelation rj is provided by the corresponding sample moment: (remember at this stage, you still have no ”model” to estimate, so it is natural to use moment estimator) rˆj = γˆj γˆ0 , where γˆj = 1 T X T t=j+1 (Yt − Y¯ )(Yt−j − Y¯ ) for j = 0, 1, 2, ..., T − 1 Y¯ = 1 T X T t=1 Yt . If the data were really generated by a Gauss MA(q) process, then the covariance of the estimated autocorrelation rˆj , could be approximated by (see Box et al. (1994), p. 33) V ar(rˆj) ∼= 1 T ( 1 + 2 X q i=1 r 2 i ) for j = q + 1, q + 2, ... (1) To use (1) in practice, the estimated autocorrelation rˆj (j = 1, 2, ..., q) are substituted for the theoretical autocorrelation rj , and when this is done we shall refer to the square root of (1) as the large − lag standard error. In particular, if we suspect that the data were generated by Gaussian white noise, then rˆj ∼ N(0, 1/T) for j 6= 0, that is rˆj should lie between ±2/ √ T about 95% of the time. Example: The following estimated autocorrelations were obtained from a time series of length T = 200 observations, generated from a stochastic process for which it was known that r1 = −0.4 and rj = 0 for j ≥ 2: rˆ1 = −0.38, rˆ2 = −0.08, rˆ3 = 0.11, rˆ4 = −0.08, rˆ5 = 0.02, rˆ6 = 0.00, rˆ7 = 0.00, rˆ8 = 0.00, rˆ9 = 0.07 and rˆ10 = −0.08. On the assumption that the series is complete random: H0 : q = 0, then for all lags, (1) yields V ar(rˆ1) ∼= 1 T = 1 200 = 0.005. 2
Under the null hypothesis 1~N(0.,0.005) or the 95% confidence interval is 2 √0.005 0.14<f1<0.14. Since the value of estimated Fi is-0.38, which is outside the confidence interval it can be conclude that the hypothesis that q=0 is rejected It might be reasonable to ask next whether the series was compatible with the hypothesis that q= 1. Using (1) with q= 1, the estimated large-lag variance under this assumption is var()≌1+2(-0389 0.0064 Under the null hypothesis 2~N(0,0.0064) or the 95% confidence interval is 2 <2 0.0064 三-0.16<2<0.16 Since the value of estimated F2 is-0.08, which is lying in the confidence interval it can be conclude that the hypothesis that g= l is accepted 1.2.2 Partial Autocorrelation Function Another useful measures is the partial autocorrelation which is a device to exploits the fact that whereas an AR(p) has an autocorrelation function which is infinite in extent, it can by its very nature be described in terms of p nonzero functions of the autocorrelations. The mth population partial autocorrelation
Under the null hypothesis, rˆ1 ∼ N(0, 0.005) or the 95% confidence interval is −2 < rˆ1 √ 0.005 < 2 ≡ −0.14 < rˆ1 < 0.14. Since the value of estimated rˆ1 is −0.38, which is outside the confidence interval, it can be conclude that the hypothesis that q = 0 is rejected. It might be reasonable to ask next whether the series was compatible with the hypothesis that q = 1. Using (1) with q = 1, the estimated large-lag variance under this assumption is V ar(rˆ2) ∼= 1 200 [1 + 2(−0.38)2 ] = 0.0064. Under the null hypothesis, rˆ2 ∼ N(0, 0.0064) or the 95% confidence interval is −2 < rˆ2 √ 0.0064 < 2 ≡ −0.16 < rˆ2 < 0.16. Since the value of estimated rˆ2 is −0.08, which is lying in the confidence interval, it can be conclude that the hypothesis that q = 1 is accepted. 1.2.2 Partial Autocorrelation Function Another useful measures is the partial autocorrelation which is a device to exploits the fact that whereas an AR(p) has an autocorrelation function which is infinite in extent, it can by its very nature be described in terms of p nonzero functions of the autocorrelations. The mth population partial autocorrelation 3
(denoted aim))is defined as the last coefficient in a linear projection of y on its m most recent value Yt+-=a(Y2-p)+a2m)(Y1-1-p)+…+a(m)(Y1=m+1-1).(2) We saw in(15 )of Chapter 15 that the vector a(m) can be calculated from 70 71 71 71 Recall that if the data were really generated by an AR(p) process, only p most recent values of Y would be useful for forecasting. In this case. A projection coefficients on Y's more than p periods in the past are equal to zeros a(m)=0 for m=p+1,p+2, By contrast, if the data really were generated by an MA(g process with q>1 then the partial autocorrelation a m)asymptotically approaches zero instead of cutting off abruptly Since forecast error Et+1 is uncorrelated with xt, we could rewrite(2)as Y+1-4=am(Y-p)+a2m)(Y-1-)+…+am)(Y1=m+1-p)+e+,t∈T Y-=a(Y-1-p)+a2(Y-2-p)+…+am)(Y1-m-p)+e,t∈T.(3) The reason why the quantity am 'defined through(2)is called the partial autocorrelation of the process Yt at lag m is clear from(3), since it is actually equal to the partial correlation between the variable Yt and Yi-m adjusted for the interme- diate variables yl.yo,Y-m. and a(m)measures the correlation between Yt and Y-m after adjusting for the effect of Yt-1, Yt-2,., lation between Yt and Yi-m not account for by Yt-1, Yt-2,.,Yt-m+1). See the
(denoted α (m) m ) is defined as the last coefficient in a linear projection of Y on its m most recent value: Yˆ t+1|t − µ = α (m) 1 (Yt − µ) + α (m) 2 (Yt−1 − µ) + ... + α (m) m (Yt−m+1 − µ). (2) We saw in (15) of Chapter 15 that the vector α(m) can be calculated from α (m) 1 α (m) 2 . . . α (m) m = γ0 γ1 . . . γm−1 γ1 γ0 . . . γm−2 . . . . . . . . . . . . . . . . . . γm−1 γm−2 . . . γ0 −1 γ1 γ2 . . . γm . Recall that if the data were really generated by an AR(p) process, only the p most recent values of Y would be useful for forecasting. In this case, the projection coefficients on Y ’s more than p periods in the past are equal to zeros: α (m) m = 0 for m = p + 1, p + 2, ... By contrast, if the data really were generated by an MA(q) process with q ≥ 1, then the partial autocorrelation α (m) m asymptotically approaches zero instead of cutting off abruptly. Since forecast error εt+1 is uncorrelated with xt , we could rewrite (2) as Yt+1 − µ = α (m) 1 (Yt − µ) + α (m) 2 (Yt−1 − µ) + ... + α (m) m (Yt−m+1 − µ) + εt+1, t ∈ T or Yt − µ = α (m) 1 (Yt−1 − µ) + α (m) 2 (Yt−2 − µ) + ... + α (m) m (Yt−m − µ) + εt , t ∈ T . (3) The reason why the quantity α (m) m defined through (2) is called the partial autocorrelation of the process {Yt} at lag m is clear from (3), since it is actually equal to the partial correlation between the variable Yt and Yt−m adjusted for the intermediate variables Yt−1, Yt−2, ..., Yt−m+1, and α (m) m measures the correlation between Yt and Yt−m after adjusting for the effect of Yt−1, Yt−2, ..., Yt−m+1 (or the correlation between Yt and Yt−m not account for by Yt−1, Yt−2, ..., Yt−m+1). See the 4
counterpart-result from sample on p. 6 of Chapter 6 A natural estimate of the mth partial autocorrelations is the last coefficients in an OLS regression of y on a constant and its m most recent values Y=+amY2-1+a2)Y=2+…+am)Y-m+e, where et denotes the Ols regression residual. If the data were really generated by an AR(p) process, then the sample estimate am would have a variance around the true value(0)that could be approximated by(see Box et al. 1994, p. 68) Var(am ))e for m=p+1,p+ 1. 3 Use of model selection Criteria Another approach to model selection is the use of information criteria such as AlC proposed by Akaike(1974)or the BIC of Schwarz(1978). In the implementation of this approach, a range of potential ARMA models is estimated by maximum likelihood methods to be discussed in Chapter 17, and for each, a criterion such AIC (normalized by sample size T, given by 2In(ma. timized likelihood)+2m 2m AlCp. 9 T or the related Bic given by mIn(t BIO In(0)+ is evaluated. where az denotes the maximum likelihood estimate of m=p+q+l denotes the number of parameters estimated in the model, in- cluding a constant term. In the criteria above the first term essentially corre- ponds to minus 2/T times the log of the maximized likelihood, while the second term is a" penalty factor"for inclusion of additional parameters in the model. In the information criteria approach, models that yield a minimum value for the criterion are to be preferred, and the AIC or BIC values are compared among various model as the basis for selection of the models. However. one immediate
counterpart-result from sample on p.6 of Chapter 6. A natural estimate of the mth partial autocorrelations is the last coefficients in an OLS regression of Y on a constant and its m most recent values: Yt = cˆ+ αˆ (m) 1 Yt−1 + αˆ (m) 2 Yt−2 + ... + αˆ (m) m Yt−m + eˆt , (4) where eˆt denotes the OLS regression residual. If the data were really generated by an AR(p) process, then the sample estimate αˆ (m) m would have a variance around the true value (0) that could be approximated by (see Box et al. 1994, p.68) V ar(αˆ (m) m ) ∼= 1 T for m = p + 1, p + 2, ... 1.3 Use of Model Selection Criteria Another approach to model selection is the use of information criteria such as AIC proposed by Akaike (1974) or the BIC of Schwarz (1978). In the implementation of this approach, a range of potential ARMA models is estimated by maximum likelihood methods to be discussed in Chapter 17, and for each, a criterion such as AIC (normalized by sample size T, given by AICp,q = −2 ln(maximized likelihood) + 2m T ≈ ln(σˆ 2 ) + 2m T or the related BIC given by BICp,q = ln(σˆ 2 ) + m ln(T) T is evaluated, where σˆ 2 denotes the maximum likelihood estimate of σ 2 , and m = p + q + 1 denotes the number of parameters estimated in the model, including a constant term. In the criteria above, the first term essentially corresponds to minus 2/T times the log of the maximized likelihood, while the second term is a ”penalty factor” for inclusion of additional parameters in the model. In the information criteria approach, models that yield a minimum value for the criterion are to be preferred, and the AIC or BIC values are compared among various model as the basis for selection of the models. However, one immediate 5
disadvantage of this approach is that several models may have to be estimated by MLE, which is computationally time consuming and expensive. For this reason Hannan and Rissanen(1982) propose an alternative model selection procedure 2 Model estimation By estimation we mean efficient use of the data to make inference about pa- rameters conditional on the adequacy of the model entertained. See Chapter 17 for details 3 Model Diagnostic Checking By diagnostic checking we mean checking the fitted model in its relation to the data with intent to reveal model inadequacies and so to achieve model im- provement Suppose that using a particular time series, the model has been identified and the parameters estimated using the methods described in Chapter 17. The ques- tion remains(unlike the regression analysis where an economic or finance model is provided by theoretical literature) of deciding whether this model is adequate If there should be evidence of serious inadequacy, we shall need to know how the model should be modified. By reference to familiar procedures outside time se- ries analysis, the scrutiny of residuals for the analysis of variance would be called diagnostic check 3.1 Diagnostic Checks Applied to residuals It cannot be too strongly emphasized that visual inspection of a plot of the residual is an indispensable first step in the checking process
disadvantage of this approach is that several models may have to be estimated by MLE, which is computationally time consuming and expensive. For this reason, Hannan and Rissanen (1982) propose an alternative model selection procedure. 2 Model Estimation By estimation we mean efficient use of the data to make inference about parameters conditional on the adequacy of the model entertained. See Chapter 17 for details. 3 Model Diagnostic Checking By diagnostic checking we mean checking the fitted model in its relation to the data with intent to reveal model inadequacies and so to achieve model improvement. Suppose that using a particular time series, the model has been identified and the parameters estimated using the methods described in Chapter 17. The question remains (unlike the regression analysis where an economic or finance model is provided by theoretical literature) of deciding whether this model is adequate. If there should be evidence of serious inadequacy, we shall need to know how the model should be modified. By reference to familiar procedures outside time series analysis, the scrutiny of residuals for the analysis of variance would be called diagnostic checks. 3.1 Diagnostic Checks Applied to residuals It cannot be too strongly emphasized that visual inspection of a plot of the residual is an indispensable first step in the checking process. 6
3.1.1 Autocorrelation Check Suppose we have identified and fitted a model O(LY=B(LEt with MlE estimator(o, 0)obtained for the parameters. Then we shall refer the quantities t=8-(LO(LY the residuals.The residuals are computed recursive from O(L)E:=O(L)Y as =Y-∑9Y1-+∑01-t=1,2,…T using either zero initial values(conditional method) or back-forecasted initial value(exact method) for the initial a's and y's Now it is possible to show that if the model is adequate Et=Et+OI read as big T-/2, it means this term has to multiply T/ to be bounded. That is, it converges to zero itself! As the series length increase, the et's become close to the white noise Ets. There- fore, one might expect that study of the Et's could indicate the existence and nature of model adequacy. In particular, recognizable patterns in the estimated autocorrelations function of the Et's, f,(2), and using (1), could point out to ap- propriate modification in the model 3.1.2 Portmanteau lack-of-Fit Test Rather than consider the Fi(a)'s individually, an indication is often needed of whether, say, the first 20 autocorrelations of the Ets taken as a whole, indicating inadequacy of the model. Suppose we have the first k autocorrelation F,(E), j IHere, k is chosen sufficiently large so that the weight p, in the model written in the form Yt=o(L)O(LEt=P(L)Et negligible small after i= k
3.1.1 Autocorrelation Check Suppose we have identified and fitted a model φ(L)Yt = θ(L)εt with MLE estimator (φ, ˆ ˆθ) obtained for the parameters. Then we shall refer the quantities εˆt = ˆθ −1 (L)φˆ(L)Yt as the residuals. The residuals are computed recursive from ˆθ(L)εˆt = φˆ(L)Yt as εˆt = Yt − X p j=1 φˆ jYt−j + X q j=1 ˆθjεˆt−j t = 1, 2, ..., T using either zero initial values (conditional method) or back-forecasted initial value (exact method) for the initial εˆ 0 s and Y 0 s. Now it is possible to show that if the model is adequate, εˆt = εt + O 1 √ T . (read as big O T −1/2 , it means this term has to multiply T 1/2 to be bounded. That is, it converges to zero itself !) As the series length increase, the εˆt ’s become close to the white noise εt ’s. Therefore, one might expect that study of the εˆt ’s could indicate the existence and nature of model adequacy. In particular, recognizable patterns in the estimated autocorrelations function of the εˆt ’s, rˆj (εˆ), and using (1), could point out to appropriate modification in the model. 3.1.2 Portmanteau Lack-of-Fit Test Rather than consider the rˆj (εˆ)’s individually, an indication is often needed of whether, say, the first 20 autocorrelations of the εˆt ’s taken as a whole, indicating inadequacy of the model. Suppose we have the first k autocorrelation 1 rˆj(εˆ), j = 1Here, k is chosen sufficiently large so that the weight ϕj in the model written in the form Yt = φ(L) −1 θ(L)εt = ϕ(L)εt will be negligible small after j = k. 7
1, 2,., k form any ARMA(p, g) process; then it is possible to show that if the model is appropriate, the Box-Pierce(1970)Q statistics 件2() is approximately distributed as xt. On the other hand, if the model is inappro- priate, the average value of Q will be inflated. A refinement that appears to have better finite-sample properties is the Ljung-Box(1979)statistics ∑ f2(e) Q=T(T+2) The limiting distribution of Q is the same as that of Q Exercise: Build up a stochastic model to the data set i give to you from Box-Jenkins procedure
1, 2, ..., k form any ARMA(p, q) process; then it is possible to show that if the model is appropriate, the Box-Pierce (1970) Q statistics Q = T X k j=1 rˆ 2 j (εˆ), is approximately distributed as χ 2 k . On the other hand, if the model is inappropriate, the average value of Q will be inflated. A refinement that appears to have better finite-sample properties is the Ljung-Box (1979) statistics: Q 0 = T(T + 2)X k j=1 rˆ 2 j (εˆ) T − k . The limiting distribution of Q0 is the same as that of Q. Exercise: Build up a stochastic model to the data set I give to you from Box-Jenkins procedure. 8