正在加载图片...
716 Chapter 20.Statistical Learning Methods 20.2 LEARNING WITH COMPLETE DATA par neter learning with comple te data A parameter learning ta el or a proba ample,we might be n learning the condi I probab d.Fore ilities in a Baye esian network ven su are complete for e ble in the probability model ing learned data greatly simplify the e parameters of a Maximum-likelihood parameter learning:Discrete models Suppose we buy a bag of lime and cherry candy from a new manufacturer whose lime-cherry nortions are completely unknown-that is the fraction could he anywhere between o and 1.In that case.we have a continuum of hypotheses.The parameter in this case,which we call 6.is the p qually likely a priori.then a maximum- likelihod approach iseb Ife modeltheinwth Bayesianrke need just one andom variable flavor (the flayor ofa randomly chosen candy from the bag) It has values cherry and lime.where the probability of cherry is(sce Figure 20.2(a)).Now we ap N candies.of which c are cherries and (=N-c are limes.According P(dlhe)=TT P(djlhe)=0.(1-0) 21 The maximum-likelihood hypothesis is given by the value of that maximizes this expres- sion.The same value is obtained by maximizing the log likelihood. L(diho)=log P(dlhe)=>log P(dho)=elog+elog(10) (By taking logarithms,we reduce the product to a sum over the data,which is usually easier o maximize.)To find the maximum-likelihood value of.we differentiate L with respect to and set the resulting expression toero dL(dlho)= 01-00 →0= += In English,then,the maximum-likelihood hypothesish asserts that the actual proportion of cherries in the bag is equal to the observed proportion in the candies unwrapped so farl It appears that we have done a lot of work to discover the obvious.In fact,though,we have laid out one standard method for maximum-likelihood parameter learning: 1.Write down an expression for the likelihood ofthe data as a function of the parameter(s) 2.Write down the derivative of the log likelihood with respect to each parameter 3.Find the parameter values such that the derivatives are zero. 716 Chapter 20. Statistical Learning Methods 20.2 LEARNING WITH COMPLETE DATA Our development of statistical learning methods begins with the simplest task: parameter learning with complete data. A parameter learning task involves finding the numerical pa- PARAMETER LEARNING COMPLETE DATA rametersfor a probability model whose structure is fixed. For example, we might be interested in learning the conditional probabilities in a Bayesian network with a given structure. Data are complete when each data point contains values for every variable in the probability model being learned. Complete data greatly simplify the problem of learning the parameters of a complex model. We will also look briefly at the problem of learning structure. Maximum-likelihood parameter learning: Discrete models Suppose we buy a bag of lime and cherry candy from a new manufacturer whose lime–cherry proportions are completely unknown—that is, the fraction could be anywhere between 0 and 1. In that case, we have a continuum of hypotheses. The parameter in this case, which we call θ, is the proportion of cherry candies, and the hypothesis is hθ. (The proportion of limes is just 1 − θ.) If we assume that all proportions are equally likely a priori, then a maximum￾likelihood approach is reasonable. If we model the situation with a Bayesian network, we need just one random variable, Flavor (the flavor of a randomly chosen candy from the bag). It has values cherry and lime, where the probability of cherry is θ (see Figure 20.2(a)). Now suppose we unwrap N candies, of which c are cherries and ` = N − c are limes. According to Equation (20.3), the likelihood of this particular data set is P(d|hθ) = Y N j = 1 P(dj |hθ) = θ c · (1 − θ) ` . The maximum-likelihood hypothesis is given by the value of θ that maximizes this expres￾LOG LIKELIHOOD sion. The same value is obtained by maximizing the log likelihood, L(d|hθ) = log P(d|hθ) = X N j = 1 log P(dj |hθ) = c log θ + ` log(1 − θ) . (By taking logarithms, we reduce the product to a sum over the data, which is usually easier to maximize.) To find the maximum-likelihood value of θ, we differentiate L with respect to θ and set the resulting expression to zero: dL(d|hθ) dθ = c θ − ` 1 − θ = 0 ⇒ θ = c c + ` = c N . In English, then, the maximum-likelihood hypothesis hML asserts that the actual proportion of cherries in the bag is equal to the observed proportion in the candies unwrapped so far! It appears that we have done a lot of work to discover the obvious. In fact, though, we have laid out one standard method for maximum-likelihood parameter learning: 1. Write down an expression for the likelihood of the data as a function of the parameter(s). 2. Write down the derivative of the log likelihood with respect to each parameter. 3. Find the parameter values such that the derivatives are zero
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有