84 Section 8.5.Breakdown of assumptions Non-Existence of the MLE Multiple Solutions to Maximization Problem Multiple Solutions to Score Equations Number of Parameters Increase with the Sample Size Support of p(z;0)depends on 0 ●Non-I.I.D.Data
84 Section 8.5. Breakdown of assumptions • Non-Existence of the MLE • Multiple Solutions to Maximization Problem • Multiple Solutions to Score Equations • Number of Parameters Increase with the Sample Size • Support of p(x; θ) depends on θ • Non-I.I.D. Data
85 Non-Existence of the MLE The non-existence of the MLE may occur for all values of m or for only some of them.In general,this is due either to the fact that the parameter space is not compact or that the log-likelihood is discontinuous in 0. Example 8.1:Suppose that X~Bernoulli(1/(1+exp(0)),where e=R.If we observe z =1,then L(;1)=1/(1+exp(0)).The likelihood function is a decreasing function of 0 and the maximum is not attained on If were closed,i.e.,=R,the MLE would be -oo. Example 8.2:Suppose that X~Normal(u,o2).So,0=(u,o2) and日=R×R+.Now,l(0;x)ox-logo-a(z-)2.Take u=x.Then as o→0,l(0;x)→+o.So,the MLE does not exist
85 Non-Existence of the MLE The non-existence of the MLE may occur for all values of xn or for only some of them. In general, this is due either to the fact that the parameter space is not compact or that the log-likelihood is discontinuous in θ. Example 8.1: Suppose that X ∼ Bernoulli(1/(1 + exp(θ)), where Θ = R. If we observe x = 1, then L(θ; 1) = 1/(1 + exp(θ)). The likelihood function is a decreasing function of θ and the maximum is not attained on Θ. If Θ were closed, i.e., Θ = R ¯ , the MLE would be −∞. Example 8.2: Suppose that X ∼ Normal(µ, σ2). So, θ = (µ, σ2) and Θ = R × R+. Now, l(θ; x) ∝ − log σ − 12σ2 (x − µ)2. Take µ = x. Then as σ → 0, l(θ; x) → +∞. So, the MLE does not exist
86 Multiple Solutions One reason for multiple solutions to the maximization problem is non-identification of the parameter 0. Example 8.3:Suppose that Y~Normal(X0,I),where X is an n×k matrix with rank smaller than k and 0∈曰cRk.The density function is pv:0)-(2z)-a/2exp(-j(u-x0Y(v-X0) Since X is not full rank,there exists an infinite number of solutions to xo=0.That means that there exists an infinite number of 0's that generate the same density function.So,0 is not identified. Furthermore,note that the likelihood is maximized at all values of 0 satisfying X'X=X'y
86 Multiple Solutions One reason for multiple solutions to the maximization problem is non-identification of the parameter θ. Example 8.3: Suppose that Y ∼ Normal(Xθ, I), where X is an n × k matrix with rank smaller than k and θ ∈ Θ ⊂ Rk. The density function is p(y; θ) = (2π)−n/2 exp(−12(y − Xθ)(y − Xθ)) Since X is not full rank, there exists an infinite number of solutions to Xθ = 0. That means that there exists an infinite number of θ’s that generate the same density function. So, θ is not identified. Furthermore, note that the likelihood is maximized at all values of θ satisfying XXθ = Xy
87 Multiple Roots to the Score Equations Even though the score equations may have multiple roots for fixed n,we can still use our theorems to show consistency and asymptotic normality.This will work provided that as n gets large there is a unique maximum with large probability. Example 8.4:Suppose that Xn=(X1,...,Xn),where the Xi's are i.i.d.Cauchy(0,1).We assume that 0o lies in the interior of a compact setΘcR.So, 1 p(x;0)= π(1+(x-0)2) So,the log-likelihood for the full sample is l(0:x)=-nlogπ-∑log(1+(-0)2) i=1 Note that as0→±o,l(0;c)→-o
87 Multiple Roots to the Score Equations Even though the score equations may have multiple roots for fixed n, we can still use our theorems to show consistency and asymptotic normality. This will work provided that as n gets large there is a unique maximum with large probability. Example 8.4: Suppose that Xn = (X1,...,Xn), where the Xi’s are i.i.d. Cauchy(θ, 1). We assume that θ0 lies in the interior of a compact set Θ ⊂ R. So, p(x; θ) = 1 π(1 + (x − θ)2) So, the log-likelihood for the full sample is l(θ; x) = −n log π − n i=1 log(1 + (xi − θ)2) Note that as θ → ±∞, l(θ; x) → −∞
88 The score for 0 is given by 立部 2(xc-0) de =1 As the picture below demonstrates,there can be multiple roots to the score equations
88 The score for θ is given by dl(θ; x) dθ = n i=1 2(xi − θ) 1+(xi − θ)2 As the picture below demonstrates, there can be multiple roots to the score equations
89 We can verify the conditions of Theorem 8.2 to show that the MLE is consistent.First,we know that Qo()is uniquely maximized at 0o since we can show that 0o is identified.Does there exist 00o so that p(x;0)=p(x;00)?If so,then it must be the case that (x-0)2 =(x-00)2 for all x.This can only happen if =00.Thus, 0o is identified.By assumption,we know that e is compact.To show continuity of Qo()and uniform convergence in probability of Q(;Xn)to Qo(0),we appeal to the conditions of Lemma 8.3.We have to show that logp(x;0)is continuous in 0 for 0e and all x e Y.This function clearly satisfies this continuity condition. Finally,we have to show that there exists a function d(x)such that |log p(x;f)l≤d(x)for all0∈Θandx∈Y and Eo[d(X)】<o
89 We can verify the conditions of Theorem 8.2 to show that the MLE is consistent. First, we know that Q0(θ) is uniquely maximized at θ0 since we can show that θ0 is identified. Does there exist θ = θ0 so that p(x; θ) = p(x; θ0)? If so, then it must be the case that (x − θ)2 = (x − θ0)2 for all x. This can only happen if θ = θ0. Thus, θ0 is identified. By assumption, we know that Θ is compact. To show continuity of Q0(θ) and uniform convergence in probability of Q(θ; Xn) to Q0(θ), we appeal to the conditions of Lemma 8.3. We have to show that log p(x; θ) is continuous in θ for θ ∈ Θ and all x ∈ X . This function clearly satisfies this continuity condition. Finally, we have to show that there exists a function d(x) such that | log p(x; θ)| ≤ d(x) for all θ ∈ Θ and x ∈ X and Eθ0 [d(X)] < ∞
90 Note that there exist positive constants C1,C2>1 and C3 so that Ilogp(x;0)=I-log-log(1+(x-0)2) logπ+log(1+(x-0)2) C1+log(C2+C3x2)=d(x) It remains to show that Eold(X)]<oo.Note that Eold(x)C:+log(Ca+Ca00 1 -dx -a+bgc+ca+日- 1 -dx oCC =( 1 d 60 = G+人aec+ce+ao9a十西c+ a(C+C+u 1 00
90 Note that there exist positive constants C1, C2 > 1 and C3 so that | log p(x; θ)| = | − log π − log(1 + (x − θ)2)| = log π + log(1 + (x − θ)2) ≤ C1 + log(C2 + C3x2) = d(x) It remains to show that Eθ0 [d(X)] < ∞. Note that Eθ0 [d(X)] = ∞−∞{C1 + log(C2 + C3x2)} 1 π(1 + (x − θ0)2) dx = C1 + ∞−∞ log(C2 + C3x2) 1 π(1 + (x − θ0)2)dx = C1 + ∞−∞ log(C2 + C3(x + θ0)2) 1 π(1 + x2)dx = C1 + −θ0 −∞ log(C2 + C3(x + θ0)2) 1 π(1 + x2) dx + ∞−θ0 log(C2 + C3(x + θ0)2) 1 π(1 + x2) dx
91 Now,log(CCa(0)2)is equal to 00 Jx(00) ec+cae+na+西o+hw+ca+aP门+ rx(0o) which is less than -60 rx(0o) 回dr (00) ee+cae+o门++ π(1+x2) for x(0o)small enough.Both of the integrals in the sum are bounded.Similar arguments can be made for the olog(CCa())d.Thus,we know that Eeold(X)]<co
91 Now, −θ0 −∞ log(C2 + C3(x + θ0)2) 1 π(1+x2)dx is equal to −θ0 x(θ0) log(C2+C3(x+θ0)2) 1 π(1 + x2) dx+ x(θ0) −∞ log(C2+C3(x+θ0)2) 1 π(1 + x2) dx which is less than −θ0 x(θ0) log(C2 + C3(x + θ0)2) 1 π(1 + x2)dx + x(θ0) −∞ |x| π(1 + x2)dx for x(θ0) small enough. Both of the integrals in the sum are bounded. Similar arguments can be made for the ∞θ0 log(C2 + C3(x + θ0)2) 1 π(1+x2)dx. Thus, we know that Eθ0 [d(X)] < ∞.
92 Number of Parameters Increase with the Sample Size Up to now,we have implicitly assumed that the number of parameters is equal to a fixed constant k.In some cases the number of parameters increases naturally with the number of observations.In such cases,the MLE may i.no longer converge ii.may converge to a parameter value different than 0o iii.may still converge to 0o. In general,the outcome depends on the importance of the number of parameters relative to the number of observations
92 Number of Parameters Increase with the Sample Size Up to now, we have implicitly assumed that the number of parameters is equal to a fixed constant k. In some cases the number of parameters increases naturally with the number of observations. In such cases, the MLE may i. no longer converge ii. may converge to a parameter value different than θ0 iii. may still converge to θ0. In general, the outcome depends on the importance of the number of parameters relative to the number of observations
93 Example 8.5:(Neyman-Scott,Econometrika,1948) Suppose that Xn=(X1,...,Xn),where the Xi's are independent with Xi=(Xil,Xi2),Xil independent of Xi2 and Xip ~N(ui,o2) for p=1,2.We are interested in estimating the ui's and o2.In this problem,we have n+1 parameters.The likelihood function is 10amo2g)-13a(-a2x-9 2=1 It is easy to show that the MLE's are = for 2 62= 去2x。-aP i=1p=1
93 Example 8.5: (Neyman-Scott, Econometrika, 1948) Suppose that Xn = (X1,...,Xn), where the Xi’s are independent with Xi = (Xi1, Xi2), Xi1 independent of Xi2 and Xip ∼ N(µi, σ2) for p = 1, 2. We are interested in estimating the µi’s and σ2. In this problem, we have n + 1 parameters. The likelihood function is L(µ1,...,µn, σ2; xn) = n i=1 1 2πσ2 exp(− 12σ2 2 p=1 (Xip − µi)2) It is easy to show that the MLE’s are µ ˆ i = 1 2(Xi1 + Xi2) for i = 1,...,n σ ˆ2 = 1 2n n i=1 2 p=1 (Xip − µˆi)2