Ch. 17 Maximum likelihood estimation e identica ation process having led to a tentative formulation for the model, we then need to obtain efficient estimates of the parameters. After the parameters have been estimated, the fitted model will be subjected to diagnostic checks This chapter contains a general account of likelihood method for estimation of the parameters in the stochastic model Consider an ARMA (from model identification) model of the form Y=c+φ1Yt-1+2Yt-2+…+qYt-p+et+61et-1 62t-2+…+6et-9 with Et white E(t)=0 E(EtEr) g- for t=T 0 otherwise This chapter explores how to estimate th ne value (c,q,…,p,6…,,a2) on the basis of observations on y The primary principle on which estimation will be based is macimum likelihood estimation. Let 8=(c,o1,.p, 01, , q, 0) denote the vector of population parameters. Suppose we have observed a sample of size T(y1, 92,.,r). The approach will be to calculate the joint probability density f,r11(ym,y-1,…,;) which might loosely be viewed as the probability of having observed this particular sample. The maximum likelihood estimate(MLE)of 8 is the value for which this sample is most likely to have been observed; that is, it is the value of 0 that Maximizes This approach requires specifying a particular distribution for the white noise process Et. Typically we will assume that Et is gaussian white noise
Ch. 17 Maximum Likelihood Estimation The identification process having led to a tentative formulation for the model, we then need to obtain efficient estimates of the parameters. After the parameters have been estimated, the fitted model will be subjected to diagnostic checks . This chapter contains a general account of likelihood method for estimation of the parameters in the stochastic model. Consider an ARMA (from model identification) model of the form Yt = c + φ1Yt−1 + φ2Yt−2 + ... + φpYt−p + εt + θ1εt−1 +θ2εt−2 + ... + θqεt−q, with εt white noise: E(εt) = 0 E(εtετ ) = σ 2 for t = τ 0 otherwise . This chapter explores how to estimate the value of (c, φ1, ..., φp, θ1, ..., θq, σ 2 ) on the basis of observations on Y . The primary principle on which estimation will be based is maximum likelihood estimation. Let θ = (c, φ1, ..., φp, θ1, ..., θq, σ 2 ) 0 denote the vector of population parameters. Suppose we have observed a sample of size T (y1, y2, ..., yT ). The approach will be to calculate the joint probability density fYT ,YT −1,...,Y1 (yT , yT −1, ..., y1; θ), (1) which might loosely be viewed as the probability of having observed this particular sample. The maximum likelihood estimate (MLE) of θ is the value for which this sample is most likely to have been observed; that is, it is the value of θ that maximizes (1). This approach requires specifying a particular distribution for the white noise process εt . Typically we will assume that εt is gaussian white noise: εt ∼ i.i.d. N(0, σ 2 ). 1
1 MLE of a Gaussian AR(1)Process 1.1 Evaluating the Likelihood Function Using(Scalar)Con ditional Density A stationary Gaussian AR(1) process takes the form Yt=c+oYt-1+ with Et w i.i.d. N(0, o2)and | l 1(How do you know at this stage ). For this e=(c, Consider the p.d. f of Y1, the first observations in the sample. This is a random variable with mean and variance E(Y1)= Var(ri) Since Et]oo_oo is Gaussian, Yi is also Gaussian. Hence, f1(m;0)=f1(i;c,,a2) v2r√a2/(1-a2 2 Next consider the distribution of the second observation y conditional on the serving Yi=y1. From(2) Y2=c+Y1+E2 Conditional on Yi= y1 means treating the random variable Y1 as if it were the deterministic constant y1. For this case, (3)gives Y2 as the constantc+ oy1) plus the N(0, a2)variable E2. Hence (Y2Y1=y)~N(c+om),o2), meaning that f1(y29y;e) The joint density of observations 1 and 2 is then just f2n(y2,1;6)=fy(y2lyn;)fn(y1;6)
1 MLE of a Gaussian AR(1) Process 1.1 Evaluating the Likelihood Function Using (Scalar) Conditional Density A stationary Gaussian AR(1) process takes the form Yt = c + φYt−1 + εt , (2) with εt ∼ i.i.d. N(0, σ 2 ) and |φ| < 1 (How do you know at this stage ?). For this case, θ = (c, φ, σ 2 ) 0 . Consider the p.d.f of Y1, the first observations in the sample. This is a random variable with mean and variance E(Y1) = µ = c 1 − φ and V ar(Y1) = σ 2 1 − φ2 . Since {εt} ∞ t=−∞ is Gaussian, Y1 is also Gaussian. Hence, fY1 (y1; θ) = fY1 (y1; c, φ, σ 2 ) = 1 √ 2π p σ 2/(1 − φ 2 ) exp − 1 2 · {y1 − [c/(1 − φ)]} 2 σ 2/(1 − φ 2 ) . Next consider the distribution of the second observation Y2 conditional on the observing Y1 = y1. From (2), Y2 = c + φY1 + ε2. (3) Conditional on Y1 = y1 means treating the random variable Y1 as if it were the deterministic constant y1. For this case, (3) gives Y2 as the constant (c + φy1) plus the N(0, σ 2 ) variable ε2. Hence, (Y2|Y1 = y1) ∼ N((c + φy1), σ 2 ), meaning that fY2|Y1 (y2|y1; θ) = 1 √ 2πσ 2 exp − 1 2 · (y2 − c − φy1) 2 σ 2 . The joint density of observations 1 and 2 is then just fY2,Y1 (y2, y1; θ) = fY2|Y1 (y2|y1; θ)fY1 (y1; θ). 2
Similarly, the distribution of the third observation conditional on the first two is fralY2, Y(33ly2, 91; 8) 2 form which faY21(3,y2,1;6)=fya2(yay,g1;e)fy2,1(y2,1;6) frslYa, i(y3l32, 11: 0)fraly (y2ly1: 8)fYi(91: 0) Yt-1 matter for Yt only through the value Yt-1, and the density of observation t conditional on the preceding t-l observa- given by fy 8 tY ogt-1)2 2 The likelihood of the complete sample can thus be calculated as -1=2-(0m,yx-,yx2…,:日)=/1(;)·Ⅱx=-:0).(4) The log likelihood function(denoted C(o)) is theref C(O)=lg1(0n;0)+∑og/xx-(y-;0) The log likelihood for a sample of size T from a Gaussian AR(1) process is seen to b C(6) g(2)-3log2/(1-92) {1-(c/(1-o)}2 (T-1)/2log(27)-(T-1)/21lg2)-∑
Similarly, the distribution of the third observation conditional on the first two is fY3|Y2,Y1 (y3|y2, y1; θ) = 1 √ 2πσ 2 exp − 1 2 · (y3 − c − φy2) 2 σ 2 form which fY3,Y2,Y1 (y3, y2, y1; θ) = fY3|Y2,Y1 (y3|y2, y1; θ)fY2,Y1 (y2, y1; θ) = fY3|Y2,Y1 (y3|y2, y1; θ)fY2|Y1 (y2|y1; θ)fY1 (y1; θ). In general, the value of Y1, Y2, ..., Yt−1 matter for Yt only through the value Yt−1, and the density of observation t conditional on the preceding t − 1 observations is given by fYt|Yt−1,Yt−2,...,Y1 (yt |yt−1, yt−2, ..., y1; θ) = fYt|Yt−1 (yt |yt−1; θ) = 1 √ 2πσ 2 exp − 1 2 · (yt − c − φyt−1) 2 σ 2 . The likelihood of the complete sample can thus be calculated as fYT ,YT −1,YT −2,...,Y1 (yT , yT −1, yT −2, ..., y1; θ) = fY1 (y1; θ) · Y T t=2 fYt|Yt−1 (yt |yt−1; θ). (4) The log likelihood function (denoted L(θ)) is therefore L(θ) = log fY1 (y1; θ) + X T t=2 log fYt|Yt−1 (yt |yt−1; θ). (5) The log likelihood for a sample of size T from a Gaussian AR(1) process is seen to be L(θ) = − 1 2 log(2π) − 1 2 log[σ 2 /(1 − φ 2 )] − {y1 − [c/(1 − φ)]} 2 2σ 2/(1 − φ 2 ) −[(T − 1)/2] log(2π) − [(T − 1)/2] log(σ 2 ) − X T t=2 (yt − c − φyt−1) 2 2σ 2 . (6) 3
1. 2 Evaluating the Likelihood Function Using(Vector) Joint Density a different description of the likelihood function for a sample of size T from a Gaussian AR(1) process is some time useful. Collect the full set of observations ina(T×1) vector, y≡(Y1,Y2,…,YT) The mean of this(Tx 1) vector is E(Y1) E(Y2) E(y) E(Yr) where u=c/(1-o. The variance-covariance of y is 92=E(y-p(y-p/=a-1 where (1-2) The sample likelihood function is therefore the multivariate Gaussian density y(y:)=(2n)-TP2-12ep|-2(y-p)(g21(y-p) with log likelihood C(0)=(-T/2)log(2)+log|9--(y-1)2-1(y-1) (6)and(7) must represent the identical likelihood function
1.2 Evaluating the Likelihood Function Using (Vector) Joint Density A different description of the likelihood function for a sample of size T from a Gaussian AR(1) process is some time useful. Collect the full set of observations in a (T × 1) vector, y ≡ (Y1, Y2, ..., YT ) 0 . The mean of this (T × 1) vector is E(y) = E(Y1) E(Y2) . . . E(YT ) = µ µ . . . µ = µ, where µ = c/(1 − φ). The variance -covariance of y is Ω = E[(y − µ)(y − µ) 0 ] = σ 2 1 (1 − φ 2 ) 1 φ . . . φ T −1 φ 1 φ . . φ T −2 . . . . . . . . . . . . . . . . . . φ T −1 . . . . 1 = σ 2V where V = 1 (1 − φ 2 ) 1 φ . . . φ T −1 φ 1 φ . . φ T −2 . . . . . . . . . . . . . . . . . . φ T −1 . . . . 1 . The sample likelihood function is therefore the multivariate Gaussian density: fY(y; θ) = (2π) −T/2 |Ω −1 | 1/2 exp − 1 2 (y − µ) 0Ω −1 (y − µ) , with log likelihood L(θ) = (−T/2)log(2π) + 1 2 log |Ω −1 | − 1 2 (y − µ) 0Ω −1 (y − µ). (7) (6) and (7) must represent the identical likelihood function. 4
It is easy to verify by direct multiplication that L'L=V-I, with 1-20 10 10 000 Then(7)becomes C(0=(T/2)log(2)+olog oll-oy-uoLl(y-u).(8) Define theT x 1)vector y to be y L(y-A) 1-020 0 10 0 0 0 T-H 1-02(Y1-1) (Y2-1)-0(Y1-p) o(Y2-F (Y-p)-(Yr-1- 1-02 Y oY The last term in(8)can thus be written 2ty-wo-LLly-w)=2 y'y =20|(1-02)1-/(1-)2+台2 1)2
It is easy to verify by direct multiplication that L 0L = V−1 , with L = p 1 − φ 2 0 . . . 0 −φ 1 0 . . 0 0 −φ 1 0 . 0 . . . . . . . . . . . . 0 0 . . −φ 1 . Then (7) becomes L(θ) = (−T/2)log(2π) + 1 2 log |σ −2L 0L| − 1 2 (y − µ) 0σ −2L 0L(y − µ). (8) Define the (T × 1) vector y˜ to be y˜ ≡ L(y − µ) = p 1 − φ 2 0 . . . 0 −φ 1 0 . . 0 0 −φ 1 0 . 0 . . . . . . . . . . . . 0 0 . . −φ 1 Y1 − µ Y2 − µ Y3 − µ . . YT − µ = p 1 − φ 2 (Y1 − µ) (Y2 − µ) − φ(Y1 − µ) (Y3 − µ) − φ(Y2 − µ) . . (YT − µ) − φ(YT −1 − µ) = p 1 − φ 2 [Y1 − c/(1 − φ)] Y2 − c − φY1 Y3 − c − φY2 . . YT − c − φYT −1 . The last term in (8) can thus be written 1 2 (y − µ) 0σ −2L 0L(y − µ) = 1 2σ 2 y˜ 0y˜ = 1 2σ 2 (1 − φ 2 )[Y1 − c/(1 − φ)]2 + 1 2σ 2 X T t=2 (Yt − c − φYt−1) 2 . 5
The middle term in( 8)is similarly oLLI g{a-2·LL} T 2logo+2logIliLl) since L is triangular gL lo Thus equation(6)and(7)are just two different expressions for the same magni- tude. Either expression accurately describes the log likelihood function 1.3 Exact maximum Likelihood estimators for the gaus- ian ar(1) Process The MLE 8 is the value for which(6) is maximized. In principle, this requires differentiating(6) and setting the result equal to zero. In practice, when an attempt is made to carry this out, the result is a system of nonlinear equation in 8 and (Y1, Y2,., Yr) for which there is no simple solution for 8 in terms of (Y1, Y2, ,Yr). Maximization of (6) thus requires iterative or numerical proce- dure described in p 21 of Chapter 3 1. 4 Conditional maximum Likelihood estimation An alternative to numerical maximization of the exact likelihood function is to regard the value of y as deterministic and maximize the likelihood conditioned on the first observation 11x-1,17=2-51(n,yx-yx-2,…,ln;)=1f=(ml-1;6)
The middle term in (8) is similarly 1 2 log |σ −2L 0L| = 1 2 log{σ −2T · |L 0L|} = − 1 2 log σ 2T + 1 2 log |L 0L| = − T 2 log σ 2 + 1 2 log{|L 0 ||L|} since L is triangular = − T 2 log σ 2 + log |L| = − T 2 log σ 2 + 1 2 log(1 − φ 2 ). Thus equation (6) and (7) are just two different expressions for the same magnitude. Either expression accurately describes the log likelihood function. 1.3 Exact Maximum Likelihood Estimators for the Gaussian AR(1) Process The MLE θˆ is the value for which (6) is maximized. In principle, this requires differentiating (6) and setting the result equal to zero. In practice, when an attempt is made to carry this out, the result is a system of nonlinear equation in θ and (Y1, Y2, ..., YT ) for which there is no simple solution for θ in terms of (Y1, Y2, ..., YT ). Maximization of (6) thus requires iterative or numerical procedure described in p.21 of Chapter 3. 1.4 Conditional Maximum Likelihood Estimation An alternative to numerical maximization of the exact likelihood function is to regard the value of y1 as deterministic and maximize the likelihood conditioned on the first observation fYT ,YT −1,YT −2,..,Y2|Y1 (yT , yT −1, yT −2, ..., y2|y1; θ) = Y T t=2 fYt|Yt−1 (yt |yt−1; θ), 6
the objective then being to maximize C()=-(T-1)/2log(2r)-(T-1)/2]lg(2)- (yt -C-oyt-1 T-1)/2log(27)-{T-1)/2log(o2)-∑ Maximization of( 9) with respect to c and o is equivalent to minimization of >It-C-pgt-1)2=(y-xBy'(y-XB 10 which is achieved by an ordinary least square(Ols)regression of yt on a constant and its own lagged value. where X= The conditional maximum likelihood estimates of c and o are therefore given by yt-1 t=2 yt t=29t-1 9t-1 9t-1yt The conditional maximum likelihood estimator of o is found by setting (v-c-0yt-1) T-1 It is important to note if you have a sample of size T to estimate an AR(1) process by conditional MLE, you will only use T-1 observation of this sample
the objective then being to maximize L ∗ (θ) = −[(T − 1)/2] log(2π) − [(T − 1)/2] log(σ 2 ) − X T t=2 (yt − c − φyt−1) 2 2σ 2 = −[(T − 1)/2] log(2π) − [(T − 1)/2] log(σ 2 ) − X T t=2 ε 2 t 2σ 2 (9) Maximization of (9) with respect to c and φ is equivalent to minimization of X T t=2 (yt − c − φyt−1) 2 = (y − Xβ) 0 (y − Xβ), (10) which is achieved by an ordinary least square (OLS) regression of yt on a constant and its own lagged value, where y = y2 y3 . . . yT , X = 1 y1 1 y2 . . . . . . 1 yT −1 , and β = c φ . The conditional maximum likelihood estimates of c and φ are therefore given by cˆ φˆ = T − 1 PT t=2 P yt−1 T t=2 yt−1 PT t=2 y 2 t−1 −1 PT t=2 P yt−1 T t=2 yt−1yt . The conditional maximum likelihood estimator of σ 2 is found by setting ∂L ∗ ∂σ 2 = −(T − 1) 2σ 2 + X T t=2 (yt − c − φyt−1) 2 2σ 4 = 0 or σˆ 2 = X T t=2 " (yt − cˆ− φyˆ t−1) 2 T − 1 # . It is important to note if you have a sample of size T to estimate an AR(1) process by conditional MLE, you will only use T − 1 observation of this sample. 7
2 MLE of a Gaussian AR(p) Process This section discusses a Gaussian AR(p) process Yt=c+o1rt-1+o2Yt-2+.+opYt-p+Et with Et w i.i.d. N(O, 02). In this case, the vector of population parameters to be estimated is 0=(c, 1, d2,,p, 02) 2.1 Evaluating the Likelihood Function We first collect the first p observation in the sample (Y1, Y2,.,Yp) in a(p x 1) vector yp which has mean vector f, with each element 1-1-0 and variance-covariance matrix is given by Tp-1p-2p-3 The density of the first p observations is then fy=1-,1(,-1,…,y:6)=(2)-/l2v 22p=2)V-1 (2)/(o-2P/Vp exp=2o2(yp-pp)vp(.) For the remaining observations in the sample (Yp+1, Yp+2,.,Yr), conditional n the first t-l observations, the tth observations is gaussian with mean c+1-1+2y-2+…+pyt-p
2 MLE of a Gaussian AR(p) Process This section discusses a Gaussian AR(p) process, Yt = c + φ1Yt−1 + φ2Yt−2 + ... + φpYt−p + εt , with εt ∼ i.i.d. N(0, σ 2 ). In this case, the vector of population parameters to be estimated is θ = (c, φ1, φ2, ..., φp, σ 2 ) 0 . 2.1 Evaluating the Likelihood Function We first collect the first p observation in the sample (Y1, Y2, ..., Yp) in a (p × 1) vector yp which has mean vector µp with each element µ = c 1 − φ1 − φ2 − ... − φp and variance-covariance matrix is given by σ 2Vp = γ0 γ1 γ2 . . . γp−1 γ1 γ0 γ1 . . . γp−2 γ2 γ1 γ0 . . . γp−3 . . . . . . . . . . . . . . . . . . . . . γp−1 γp−2 γp−3 . . . γ1 The density of the first p observations is then fYp,Yp−1,...,Y1 (yp, yp−1, ..., y1; θ) = (2π) −p/2 |σ −2V−1 p | 1/2 exp − 1 2σ 2 (yp − µp ) 0V−1 p (yp − µp ) = (2π) −p/2 (σ −2 ) p/2 |V−1 p | 1/2 exp − 1 2σ 2 (yp − µp ) 0V−1 p (y − µp ) For the remaining observations in the sample (Yp+1, Yp+2, ..., YT ), conditional on the first t − 1 observations, the tth observations is gaussian with mean c + φ1yt−1 + φ2yt−2 + ... + φpyt−p 8
and variance o Only the p most recent observations matter for this distribution H (yiYi )=/yp yt|yt-1,…y (yt-C-o19t OpJt-p)2 The likelihood function for the complete sample is then fr,y-11(yr,y-1,…,y1;)=fvy=1.1(yp,y-1,…,:6) Ifx=,y=(mW-,…,y-p:0), and the loglikelihood is therefore c(e= log fy 1 g(o)+ologVpl-osyp-up'V(y-up i Plog(o2y 1y-1-02 22 Maximization of this exact log likelihood of an AR(p)process must be accom- plished numerically. 2.2 Conditional maximum Likelihood estimates The log of the likelihood conditional on the first p observation assume the simple form (6)= log fY,y1,1p+1y,1(,m-1,y+1p,…,;日) T P T 2l0g(2π) (y-c-01y-1-02y-2-….-h-p)2 3log(2)、T- g(2)-∑
and variance σ 2 . Only the p most recent observations matter for this distribution. Hence for t > p fYt|Yt−1,...,Y1 (yt |yt−1, ..., y1; θ) = fYt|Yt−1,..,Yt−p (yt |yt−1, .., yt−p; θ) = 1 √ 2πσ 2 exp −(yt − c − φ1yt−1 − φ2yt−2 − ... − φpyt−p) 2 2σ 2 . The likelihood function for the complete sample is then fYT ,YT −1,...,Y1 (yT , yT −1, ..., y1; θ) = fYp,Yp−1,...,Y1 (yp, yp−1, ..., y1; θ) × Y T t=p+1 fYt|Yt−1,..,Yt−p (yt |yt−1, .., yt−p; θ), and the loglikelihood is therefore L(θ) = log fYT ,YT −1,...,Y1 (yT , yT −1, ..., y1; θ) = − p 2 log(2π) − p 2 log(σ 2 ) + 1 2 log |V−1 p | − 1 2σ 2 (yp − µp ) 0V−1 p (y − µp ) − T − p 2 log(2π) − T − p 2 log(σ 2 ) − X T t=p+1 (yt − c − φ1yt−1 − φ2yt−2 − ... − φpyt−p) 2 2σ 2 . Maximization of this exact log likelihood of an AR(p) process must be accomplished numerically. 2.2 Conditional maximum Likelihood estimates The log of the likelihood conditional on the first p observation assume the simple form L ∗ (θ) = log fYT ,YT −1,..,Yp+1|Yp,...,Y1 (yT , yT −1, ..yp+1|yp, ..., y1; θ) = − T − p 2 log(2π) − T − p 2 log(σ 2 ) − X T t=p+1 (yt − c − φ1yt−1 − φ2yt−2 − ... − φpyt−p) 2 2σ 2 = − T − p 2 log(2π) − T − p 2 log(σ 2 ) − X T t=p+1 ε 2 t 2σ 2 . (11) 9
The value of c, 1, ..,p that maximizes(11)are the same as those that minimize yt -- 29t-2 y-p)2 Thus, the conditional mle of these parameters can be obtained from an Ols regression of yt on a constant and p of its own lagged values. The conditional mle estimator of o- turns out to be the average squared residual from this regression yt 中p9t It is important to note if you have a sample of size T to estimate an AR(p) process by conditional MLE, you will only use T-p observation of this sample
The value of c, φ1, ..., φp that maximizes (11) are the same as those that minimize X T t=p+1 (yt − c − φ1yt−1 − φ2yt−2 − ... − φpyt−p) 2 . Thus, the conditional MLE of these parameters can be obtained from an OLS regression of yt on a constant and p of its own lagged values. The conditional MLE estimator of σ 2 turns out to be the average squared residual from this regression: σˆ 2 = 1 T − p X T t=p+1 (yt − cˆ− φˆ 1yt−1 − φˆ 2yt−2 − ... − φˆ pyt−p) 2 . It is important to note if you have a sample of size T to estimate an AR(p) process by conditional MLE, you will only use T − p observation of this sample. 10