30 Section 8.2.Asymptotic normality We assume that Xn=(X1,...,Xn),where the Xi's are i.i.d.with common density p(x;fo)∈P={p(x;0):0∈Θ}.We assume that 0 o is identified in the sense that if0≠0oand0∈Θ,then p(x;0)p(x;00)with respect to the dominating measure u. In order to prove asymptotic normality,we will need certain regularity conditions.Some of these were encountered in the proof of consistency,but we will need some additional assumptions
30 Section 8.2. Asymptotic normality We assume that Xn = (X1,...,Xn), where the Xi’s are i.i.d. with common density p(x; θ0) ∈ P = {p(x; θ) : θ ∈ Θ}. We assume that θ0 is identified in the sense that if θ = θ0 and θ ∈ Θ, then p(x; θ) = p(x; θ0) with respect to the dominating measure µ. In order to prove asymptotic normality, we will need certain regularity conditions. Some of these were encountered in the proof of consistency, but we will need some additional assumptions.
31 Regularity Conditions i.00 lies in the interior of which is assumed to be a compact subset of Rk. i.logp(x;0)is continuous at each0∈Θfor all x∈X(a.e.will suffice). ii.|logp(x;f)川≤d(x)for all0∈Θand Eo[d(X)】0 in a neighborhood,N,of 00. v.‖pgPI‖≤e(c)for all0∈Nand∫e(r)du(c)<o
31 Regularity Conditions i. θ0 lies in the interior of Θ, which is assumed to be a compact subset of Rk. ii. log p(x; θ) is continuous at each θ ∈ Θ for all x ∈ X (a.e. will suffice). iii. | log p(x; θ)| ≤ d(x) for all θ ∈ Θ and Eθ0 [d(X)] 0 in a neighborhood, N , of θ0. v. ∂p(x;θ) ∂θ ≤ e(x) for all θ ∈ N and e(x)dµ(x) < ∞
32 vi.Defining the score vector (x;0)=(6 log p(x;0)/∂91,.,0log p(x;0)/a0k)1 then we assume that I(00)=E(X;0o)(X;00)exists and is non-singular. vi.≤fa)for all8∈Vad Eo(X】<o. .iIe%2l≤g()for all9∈V and)du(o)<. vii Theorem 8.6:If these 8 regularity conditions hold,then )-0)N(0,1-(0))
32 vi. Defining the score vector ψ(x; θ)=(∂ log p(x; θ)/∂θ1,...,∂ log p(x; θ)/∂θk) then we assume that I(θ0) = Eθ0 [ψ(X; θ0)ψ(X; θ0)] exists and is non-singular. vii. ∂2 log p(x;θ) ∂θ∂θ ≤ f(x) for all θ ∈ N and Eθ0 [f(X)] < ∞. viii. ∂2p(x;θ) ∂θ∂θ ≤ g(x) for all θ ∈ N and g(x)dµ(x) < ∞. Theorem 8.6: If these 8 regularity conditions hold, then √n(ˆθ(Xn) − θ0) D(θ0) → N(0, I−1(θ0))
33 Proof:Note that conditions i.-iii.guarantee that the MLE is consistent.Since 0o is assumed to lie in the interior of we know that with sufficiently large probability that the MLE will lie in N and cannot be on the boundary.This implies that the maximum is also a local maximum,which implies that aQ(0(Xn);Xn)/0=0 or品∑-1(X;(Xn》=0.That is,the MLE is the solution to the score equations. By the mean value theorem,applied to each element of the score vector,we have that 0=ax:x》=X:o+(-iX小mX,-) i=1 Note that J(n)is a k x k random matrix where the jth row of the matrix is the jth row of In evaluated at n(Xn)where (n)is an intermediate value between (n)and 00.n(n) may be different from row to row but it will be consistent for 00
33 Proof: Note that conditions i. - iii. guarantee that the MLE is consistent. Since θ0 is assumed to lie in the interior of Θ, we know that with sufficiently large probability that the MLE will lie in N and cannot be on the boundary. This implies that the maximum is also a local maximum, which implies that ∂Q( ˆ θ(Xn); Xn)/∂θ = 0 or 1 n ni=1 ψ(Xi; ˆθ(Xn)) = 0. That is, the MLE is the solution to the score equations. By the mean value theorem, applied to each element of the score vector, we have that 0 = 1√n ni=1 ψ(Xi; ˆθ(Xn)) = 1√n ni=1 ψ(Xi; θ0)+{−J∗n(Xn)}√n(ˆθ(Xn)−θ0) Note that J∗n(Xn) is a k × k random matrix where the jth row of the matrix is the jth row of Jn evaluated at θ∗jn(Xn) where θ∗jn(Xn) is an intermediate value between ˆ θ(Xn) and θ0. θ∗jn(Xn) may be different from row to row but it will be consistent for θ0.
34 We will establish two facts: F1:(:0)N(0,I(0)) F2:J(Xn)P9oI(0)】 By assumption vi.,we know that I(0o)is non-singular.The inversion of a non-singular matrix is a continuous function in 0. Since (n)I(0o),we know that(n))I(00)-1. This also means that with sufficiently large probability,as n gets large,(Xn)is invertible. Therefore,we know that V0x-)=(Xa0X:则 TL i=1
34 We will establish two facts: F1: √1n ni=1 ψ(Xi; θ0) D(θ0) → N(0, I(θ0)) F2: J∗n(Xn) P (θ0) → I(θ0) By assumption vi., we know that I(θ0) is non-singular. The inversion of a non-singular matrix is a continuous function in θ. Since J∗n(Xn) P→ I(θ0), we know that {J∗n(Xn)}−1 P→ I(θ0)−1. This also means that with sufficiently large probability, as n gets large, J∗n(Xn) is invertible. Therefore, we know that √n(ˆθ(Xn) − θ0) = {J∗n(Xn)}−1 1√n ni=1 ψ(Xi; θ0)
35 We then use the Slutsky's theorem to conclude that √元(Xm)-0o)bN(0,I(0o)-1)
35 We then use the Slutsky’s theorem to conclude that √n(ˆθ(Xn) − θ0) D→ N(0, I(θ0)−1)
36 Establishing F1 The random vectors (X1;00),...(Xn;0o)are i.i.d.We need to show that they have mean zero.Then,I(00)will be the covariance matrix of (X;00)and an application of the multivariate central limit theorem for i.i.d.random vectors gives the desired result. We will show something stronger,namely Eol(X;0)]=0 for all 0EN.Condition v.guarantees that we can interchange integration and differentiation.Consider the case where k=1.We know that 1=fp(x;0)du(x)for all 0EN.This implies that 0=品∫p(r;f)du(c.Let's show that 是∫p(ac;)d(x)=∫品p(x;)du(x.Choose a sequence On∈W such that n0.Then,by definition of a derivative,we know that 0-典0a二0 for all rc de m→ 0m-0
36 Establishing F1 The random vectors ψ(X1; θ0),...ψ(Xn; θ0) are i.i.d. We need to show that they have mean zero. Then, I(θ0) will be the covariance matrix of ψ(X; θ0) and an application of the multivariate central limit theorem for i.i.d. random vectors gives the desired result. We will show something stronger, namely Eθ[ψ(X; θ)] = 0 for all θ ∈ N . Condition v. guarantees that we can interchange integration and differentiation. Consider the case where k = 1. We know that 1 = p(x; θ)dµ(x) for all θ ∈ N . This implies that 0 = ddθ p(x; θ)dµ(x). Let’s show that d dθ p(x; θ)dµ(x) = ddθ p(x; θ)dµ(x). Choose a sequence θn ∈ N such that θn → θ. Then, by definition of a derivative, we know that dp(x; θ) dθ = limn→∞{p(x; θn) − p(x; θ) θn − θ } for all x ∈ X
37 By the mean value theorem,we know that c8)=pe9+a28.- do where 0 lies between 0 and On so that 0N.This implies that 匹8二01=2≤ea 0m-0 de Since e(z)is integrable,we can employ the dominated convergence theorem.This says that 0=苏红o=/-ao 0n-0 im(elc:8,)-pl9}d() 0n-0 = dp0dμ(ae) de
37 By the mean value theorem, we know that p(x; θn) = p(x; θ) + dp(x; θ∗n) dθ (θn − θ) where θ∗n lies between θ and θn so that θ∗n ∈ N . This implies that |p(x; θn) − p(x; θ) θn − θ | = |dp(x; θ∗n) dθ | ≤ e(x) Since e(x) is integrable, we can employ the dominated convergence theorem. This says that 0 = d dθ p(x; θ)dµ(x) = limn→∞ {p(x; θn) − p(x; θ) θn − θ }dµ(x) = limn→∞{p(x; θn) − p(x; θ) θn − θ }dµ(x) = dp(x; θ) dθ dµ(x)
38 This can be generalized to partial derivatives which can then be used to formally show that Eol(X;0)]=0 for 0N.We know that∫p(r;f)du(a)-l.This implies that品∫p(ar;f)d(c)-0. By dominated convergence,we can interchange differentiation and integration so that(0.Then,we know that p(z:0)/20ipla:0)du()-0 p(x;0) We can divide by p(x;0)since it is greater than zero for all 0N. This implies that Eol(;)]=0
38 This can be generalized to partial derivatives which can then be used to formally show that Eθ[ψ(X; θ)] = 0 for θ ∈ N . We know that p(x; θ)dµ(x) = 1. This implies that ∂∂θj p(x; θ)dµ(x) = 0. By dominated convergence, we can interchange differentiation and integration so that ∂p(x;θ) ∂θj dµ(x) = 0. Then, we know that ∂p(x; θ)/∂θj p(x; θ) p(x; θ)dµ(x)=0 We can divide by p(x; θ) since it is greater than zero for all θ ∈ N . This implies that Eθ[ψj (X; θ)] = 0
39 Establishing F2 First,we shall study the large sample behavior of the matrix of second partial derivatives of the log-likelihood.Define 间-9 This is a k xk random matrix
39 Establishing F2 First, we shall study the large sample behavior of the matrix of second partial derivatives of the log-likelihood. Define Jn(θ)=[− 1n n i=1 ∂2 log p(Xi; θ) ∂θ∂θ ] This is a k × k random matrix.