正在加载图片...
I.J.Myung Journal of Mathematical Psychology 47 (2003)90-100 93 because there is only one parameter beside n,which is at wi=wiMLE for all i=1,...,k.This is because the assumed to be known.If the model has two parameters, definition of maximum or minimum of a continuous the likelihood function will be a surface sitting above the differentiable function implies that its first derivatives parameter space.In general,for a model with k vanish at such points parameters,the likelihood function L(wly)takes the The likelihood equation represents a necessary con- shape of a k-dim geometrical "surface"sitting above a dition for the existence of an MLE estimate.An k-dim hyperplane spanned by the parameter vector w= additional condition must also be satisfied to ensure (1,,wk) that In L(wy)is a maximum and not a minimum,since the first derivative cannot reveal this.To be a maximum, the shape of the log-likelihood function should be 3.Maximum likelihood estimation convex (it must represent a peak,not a valley)in the neighborhood of wMLE.This can be checked by Once data have been collected and the likelihood calculating the second derivatives of the log-likelihoods function of a model given the data is determined,one is and showing whether they are all negative at wi=wi.MLE in a position to make statistical inferences about the fori=1,...,k population,that is,the probability distribution that underlies the data.Given that different parameter values m L(wb)o. (8) Ow? index different probability distributions(Fig.1),we are interested in finding the parameter value that corre- To illustrate the MLE procedure,let us again consider sponds to the desired probability distribution. the previous one-parameter binomial example given a The principle of maximum likelihood estimation fixed value of n.First,by taking the logarithm of the (MLE),originally developed by R.A.Fisher in the likelihood function L(wln =10,y=7)in Eq.(6),we 1920s,states that the desired probability distribution is obtain the log-likelihood as the one that makes the observed data "most likely," 101 which means that one must seek the value of the nL(wln=10,y=7)=n73+7lnw+3ln(1-w9) parameter vector that maximizes the likelihood function Next,the first derivative of the log-likelihood is L(wly).The resulting parameter vector,which is sought calculated as by searching the multi-dimensional parameter space,is called the MLE estimate,and is denoted by wMLE d1nL(w|n=10,y=7)_7_37-10w (10) (WI.MLE,...,W&.MLE).For example,in Fig.2,the MLE dw Γw1-ww(1-w) estimate is wMLE=0.7 for which the maximized like- By requiring this equation to be zero,the desired MLE lihood value is L(wMLE 0.7In 10,y=7)=0.267. estimate is obtained as wMLE =0.7.To make sure that The probability distribution corresponding to this the solution represents a maximum,not a minimum,the MLE estimate is shown in the bottom panel of Fig.1. second derivative of the log-likelihood is calculated and According to the MLE principle,this is the population evaluated at w WMLE, that is most likely to have generated the observed data of y=7.To summarize,maximum likelihood estima- d2In L(wIn =10,y=7) 7 三一 、3 tion is a method to seek the probability distribution that dw2 -(1-w makes the observed data most likely. =-47.62<0 (11) 3.1.Likelihood equation which is negative,as desired. In practice,however,it is usually not possible to MLE estimates need not exist nor be unique.In this obtain an analytic form solution for the MLE estimate, section,we show how to compute MLE estimates when especially when the model involves many parameters and its PDF is highly non-linear.In such situations,the they exist and are unique.For computational conve- nience,the MLE estimate is obtained by maximizing the MLE estimate must be sought numerically using non- log-likelihood function,In L(wly).This is because the linear optimization algorithms.The basic idea of non- two functions,In L(wly)and L(wly),are monotonically linear optimization is to quickly find optimal parameters related to each other so the same MLE estimate is that maximize the log-likelihood.This is done by obtained by maximizing either one.Assuming that the log-likelihood function,In L(wly),is differentiable,if Consider the Hessian matrix H(w)defined as (w)= WMLE exists,it must satisfy the following partial differential equation known as the likelihood equation: L)(.).Then a more accurate test of the convexity owiowi aln L(wly)=0 condition requires that the determinant of H(w)be negative definite. that is,H(w WMLE)z<0 for any kxl real-numbered vector z,where O z'denotes the transpose of z.because there is only one parameter beside n; which is assumed to be known. If the model has two parameters, the likelihood function will be a surface sitting above the parameter space. In general, for a model with k parameters, the likelihood function LðwjyÞ takes the shape of a k-dim geometrical ‘‘surface’’ sitting above a k-dim hyperplane spanned by the parameter vector w ¼ ðw1;y; wkÞ: 3. Maximum likelihood estimation Once data have been collected and the likelihood function of a model given the data is determined, one is in a position to make statistical inferences about the population, that is, the probability distribution that underlies the data. Given that different parameter values index different probability distributions (Fig. 1), we are interested in finding the parameter value that corre￾sponds to the desired probability distribution. The principle of maximum likelihood estimation (MLE), originally developed by R.A. Fisher in the 1920s, states that the desired probability distribution is the one that makes the observed data ‘‘most likely,’’ which means that one must seekthe value of the parameter vector that maximizes the likelihood function LðwjyÞ: The resulting parameter vector, which is sought by searching the multi-dimensional parameter space, is called the MLE estimate, and is denoted by wMLE ¼ ðw1;MLE;y; wk;MLEÞ: For example, in Fig. 2, the MLE estimate is wMLE ¼ 0:7 for which the maximized like￾lihood value is LðwMLE ¼ 0:7jn ¼ 10; y ¼ 7Þ ¼ 0:267: The probability distribution corresponding to this MLE estimate is shown in the bottom panel of Fig. 1. According to the MLE principle, this is the population that is most likely to have generated the observed data of y ¼ 7: To summarize, maximum likelihood estima￾tion is a method to seekthe probability distribution that makes the observed data most likely. 3.1. Likelihood equation MLE estimates need not exist nor be unique. In this section, we show how to compute MLE estimates when they exist and are unique. For computational conve￾nience, the MLE estimate is obtained by maximizing the log-likelihood function, ln LðwjyÞ: This is because the two functions, ln LðwjyÞ and LðwjyÞ; are monotonically related to each other so the same MLE estimate is obtained by maximizing either one. Assuming that the log-likelihood function, ln LðwjyÞ; is differentiable, if wMLE exists, it must satisfy the following partial differential equation known as the likelihood equation: @ln LðwjyÞ @wi ¼ 0 ð7Þ at wi ¼ wi;MLE for all i ¼ 1;y; k: This is because the definition of maximum or minimum of a continuous differentiable function implies that its first derivatives vanish at such points. The likelihood equation represents a necessary con￾dition for the existence of an MLE estimate. An additional condition must also be satisfied to ensure that ln LðwjyÞ is a maximum and not a minimum, since the first derivative cannot reveal this. To be a maximum, the shape of the log-likelihood function should be convex (it must represent a peak, not a valley) in the neighborhood of wMLE: This can be checked by calculating the second derivatives of the log-likelihoods and showing whether they are all negative at wi ¼ wi;MLE for i ¼ 1;y; k; 1 @2 ln LðwjyÞ @w2 i o0: ð8Þ To illustrate the MLE procedure, let us again consider the previous one-parameter binomial example given a fixed value of n: First, by taking the logarithm of the likelihood function Lðwjn ¼ 10; y ¼ 7Þ in Eq. (6), we obtain the log-likelihood as ln Lðw j n ¼ 10; y ¼ 7Þ ¼ ln 10! 7!3! þ 7 ln w þ 3 lnð1  wÞð:9Þ Next, the first derivative of the log-likelihood is calculated as d ln Lðw j n ¼ 10; y ¼ 7Þ dw ¼ 7 w  3 1  w ¼ 7  10w wð1  wÞ : ð10Þ By requiring this equation to be zero, the desired MLE estimate is obtained as wMLE ¼ 0:7: To make sure that the solution represents a maximum, not a minimum, the second derivative of the log-likelihood is calculated and evaluated at w ¼ wMLE; d2 ln Lðw j n ¼ 10; y ¼ 7Þ dw2 ¼  7 w2  3 ð1  wÞ 2 ¼  47:62o0 ð11Þ which is negative, as desired. In practice, however, it is usually not possible to obtain an analytic form solution for the MLE estimate, especially when the model involves many parameters and its PDF is highly non-linear. In such situations, the MLE estimate must be sought numerically using non￾linear optimization algorithms. The basic idea of non￾linear optimization is to quickly find optimal parameters that maximize the log-likelihood. This is done by 1Consider the Hessian matrix HðwÞ defined as HijðwÞ ¼ @2 ln LðwÞ @wi@wj ði; j ¼ 1;y; kÞ: Then a more accurate test of the convexity condition requires that the determinant of HðwÞ be negative definite, that is, z0 Hðw ¼ wMLEÞzo0 for any kx1 real-numbered vector z; where z0 denotes the transpose of z: I.J. Myung / Journal of Mathematical Psychology 47 (2003) 90–100 93
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有