正在加载图片...
1.J.Myung I Journal of Mathematical Psychology 47(2003)90-100 95 3.3.Relation to least-squares estimation time,and p(w,t)the model's prediction of the prob- ability of correct recall at time t.The two models are Recall that in MLE we seek the parameter values that defined as are most likely to have produced the data.In LSE,on power model:p(w,t)=wit-"2 (wI,w2>0), the other hand,we seek the parameter values that provide the most accurate description of the data, exponential model p(w,t)=wi exp(-w2t) (13) measured in terms of how closely the model fits the (w1,w2>0) data under the square-loss function.Formally,in LSE, the sum of squares error(SSE)between observations and Suppose that data y=(v1,...,ym)consists of m predictions is minimized: observations in which yi(0syis1)represents an ob- served proportion of correct recall at time ti(i= 77 SSE(w)= CGy-prd4(w)》2, (12) 1,...,m).We are interested in testing the viability of i=l these models.We do this by fitting each to observed data and examining its goodness of fit. where prdi(w)denotes the model's prediction for the ith Application of MLE requires specification of the PDF observation.Note that SSE(w)is a function of the f(yw)of the data under each model.To do this,first we parameter vector w=(wi,...,wk). note that each observed proportion yi is obtained by As in MLE,finding the parameter values that dividing the number of correct responses (xi)by the minimize SSE generally requires use of a non-linear total number of independent trials (n),yi= optimization algorithm.Minimization of LSE is also xi/n (0syis1)We then note that each xi is binomially subject to the local minima problem,especially when the distributed with probability p(w,t)so that the PDFs for model is non-linear with respect to its parameters.The the power model and the exponential model are choice between the two methods of estimation can have obtained as non-trivial consequences.In general,LSE estimates tend l to differ from MLE estimates,especially for data that power: are not normally distributed such as proportion correct f:n w)=(n and response time.An implication is that one might (w1店")(1-wt)”- possibly arrive at different conclusions about the same n! data set depending upon which method of estimation is exponential f(xiln,w)= (14) (n-xi)!xi! employed in analyzing the data.When this occurs,MLE should be preferred to LSE,unless the probability (w1 exp(-w2ti)) density function is unknown or difficult to obtain in an (1-w1exp(-w2t)"- easily computable form,for instance,for the diffusion wherexi=0,1,...,n,i=1,...,m. model of recognition memory (Ratcliff,1978).3 There is There are two points to be made regarding the PDFs a situation,however,in which the two methods in the above equation.First,the probability parameter intersect.This is when observations are independent of of a binomial probability distribution (i.e.w in Eq.(4)) one another and are normally distributed with a is being modeled.Therefore,the PDF for each model in constant variance.In this case,maximization of the Eq.(14)is obtained by simply replacing the probability log-likelihood is equivalent to minimization of SSE,and parameter w in Eq.(4)with the model equation,p(w,t),in therefore,the same parameter values are obtained under Eq.(13).Second,note that yi is related to xi by a fixed either MLE or LSE. scaling constant,1/n.As such,any statistical conclusion regardingx;is applicable directly to yi,except for the scale transformation.In particular,the PDF for yi,f(vin,w), 4.Illustrative example is obtained by simply replacing xi in f(xin,w)with nyi. Now,assuming that xi's are statistically independent In this section,I present an application example of of one another,the desired log-likelihood function for maximum likelihood estimation.To illustrate the the power model is given by method,I chose forgetting data given the recent surge of interest in this topic (e.g.Rubin Wenzel,1996; In L(w =(w1,w2)n,x) Wickens,1998;Wixted Ebbesen,1991). =ln(f(x1ln,w)·f(x2ln,w)…f(xmln,w)】 Among a half-dozen retention functions that have been proposed and tested in the past,I provide an ∑nf(xi,wm example of MLE for the two functions,power and exponential.Let w=(wi.w2)be the parameter vector, ∑6xn(m)+a-切n-wm6") 3For this model,the PDF is expressed as an infinite sum of ranscendental functions +lnml-ln(n-x!-lnx) (15)3.3. Relation to least-squares estimation Recall that in MLE we seekthe parameter values that are most likely to have produced the data. In LSE, on the other hand, we seekthe parameter values that provide the most accurate description of the data, measured in terms of how closely the model fits the data under the square-loss function. Formally, in LSE, the sum of squares error (SSE) between observations and predictions is minimized: SSEðwÞ ¼ Xm i¼1 ðyi  prdiðwÞÞ2 ; ð12Þ where prdiðwÞ denotes the model’s prediction for the ith observation. Note that SSEðwÞ is a function of the parameter vector w ¼ ðw1;y; wkÞ: As in MLE, finding the parameter values that minimize SSE generally requires use of a non-linear optimization algorithm. Minimization of LSE is also subject to the local minima problem, especially when the model is non-linear with respect to its parameters. The choice between the two methods of estimation can have non-trivial consequences. In general, LSE estimates tend to differ from MLE estimates, especially for data that are not normally distributed such as proportion correct and response time. An implication is that one might possibly arrive at different conclusions about the same data set depending upon which method of estimation is employed in analyzing the data. When this occurs, MLE should be preferred to LSE, unless the probability density function is unknown or difficult to obtain in an easily computable form, for instance, for the diffusion model of recognition memory (Ratcliff, 1978).3 There is a situation, however, in which the two methods intersect. This is when observations are independent of one another and are normally distributed with a constant variance. In this case, maximization of the log-likelihood is equivalent to minimization of SSE, and therefore, the same parameter values are obtained under either MLE or LSE. 4. Illustrative example In this section, I present an application example of maximum likelihood estimation. To illustrate the method, I chose forgetting data given the recent surge of interest in this topic (e.g. Rubin & Wenzel, 1996; Wickens, 1998; Wixted & Ebbesen, 1991). Among a half-dozen retention functions that have been proposed and tested in the past, I provide an example of MLE for the two functions, power and exponential. Let w ¼ ðw1;w2Þ be the parameter vector, t time, and pðw; tÞ the model’s prediction of the prob￾ability of correct recall at time t: The two models are defined as power model : pðw; tÞ ¼ w1t w2 ðw1; w240Þ; exponential model : pðw; tÞ ¼ w1 expðw2tÞ ðw1; w240Þ: ð13Þ Suppose that data y ¼ ðy1;y; ymÞ consists of m observations in which yið0pyip1Þ represents an ob￾served proportion of correct recall at time ti ði ¼ 1;y; mÞ: We are interested in testing the viability of these models. We do this by fitting each to observed data and examining its goodness of fit. Application of MLE requires specification of the PDF fðyjwÞ of the data under each model. To do this, first we note that each observed proportion yi is obtained by dividing the number of correct responses ðxiÞ by the total number of independent trials ðnÞ; yi ¼ xi=n ð0pyip1Þ We then note that each xi is binomially distributed with probability pðw; tÞ so that the PDFs for the power model and the exponential model are obtained as power : fðxi j n; wÞ ¼ n! ðn  xiÞ!xi! ðw1t w2 i Þ xi ð1  w1t w2 i Þ nxi ; exponential : fðxi j n; wÞ ¼ n! ðn  xiÞ!xi! ðw1 expðw2tiÞÞxi ð1  w1 expðw2tiÞÞnxi ; ð14Þ where xi ¼ 0; 1;y; n; i ¼ 1;y; m: There are two points to be made regarding the PDFs in the above equation. First, the probability parameter of a binomial probability distribution (i.e. w in Eq. (4)) is being modeled. Therefore, the PDF for each model in Eq. (14) is obtained by simply replacing the probability parameter w in Eq. (4) with the model equation, pðw; tÞ; in Eq. (13). Second, note that yi is related to xi by a fixed scaling constant, 1=n: As such, any statistical conclusion regarding xi is applicable directly to yi; except for the scale transformation. In particular, the PDF for yi; fðyijn; wÞ; is obtained by simply replacing xi in fðxijn; wÞ with nyi: Now, assuming that xi’s are statistically independent of one another, the desired log-likelihood function for the power model is given by ln Lðw ¼ ðw1; w2Þjn; xÞ ¼ lnðfðx1jn; wÞ  fðx2 j n; wÞ?fðxm j n; wÞÞ ¼ Xm i¼1 ln fðxijn;wÞ ¼ Xm i¼1 ðxi lnðw1t w2 i Þþðn  xiÞ lnð1  w1t w2 i Þ þ ln n!  lnðn  xiÞ!  ln xi!Þ: ð15Þ 3 For this model, the PDF is expressed as an infinite sum of transcendental functions. I.J. Myung / Journal of Mathematical Psychology 47 (2003) 90–100 95
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有