722 Chapter 20. Statistical Learning _中国高校课件下载中心

点击下载：《Artificial Intelligence：A Modern Approach》教学资源（教材，英文版）Part VI Learning 20 Statistical Learning Methods

正在加载图片...

722 Chapter 20.Statistical Learning Methods eter to get the posterio similarly afte Th an view eta and scem 1 aves ex and1actual lime candies By data For ex ly how the 2050 ch act sequence beta e dis ributio is conve same results as maximum-likelihoo ne work in Figure 20b)has three arame .and 02.where 0 is the on ility of a re NOEPENDENCE me candy. Th ypo cove ee pa s-that is.we need P(6,61,2)=P(Θ)P(61)P(2) With this assumption,each parameter can have its own beta distribution that is updated sep- ve ha e the idea that unknov ral to need to make porate ample.if we ha ndic 3 and W bability each per1,W P(Flavor;=cherry=0)=0. Similarly,the wrapper probabilities depend on and .For example P(Wrapper;red Flavor;=cherry,e1=61)=01. Now,the entire Bayesian learning process can be formulated as an inference problem in a suitably constructed Bayes net,as shown in Figure 20.6.Prediction for a new instance is done simply by adding new instance variables to the network,some of which are queried This formulation of learning and prediction makes it clear that Bayesian learning requires no extra"principles of learning."Furthermore,there is,in essence,just one learning algorithm, i.e,the inference algorithm for Bayesian networks. Learning Bayes net structures So far,we have assumed that the structure of the Bayes net is given and we are just trying to learn the parameters.The structure of the network represents basic causal knowledge about the domain that is often easy for an expert,or even a naive user,to supply.In some cases, however,the causal model may be unavailable or subject to dispute-for example,certain corporations have long claimed that smoking does not cause cancer-so it is important to 722 Chapter 20. Statistical Learning Methods Thus, after seeing a cherry candy, we simply increment the a parameter to get the posterior; similarly, after seeing a lime candy, we increment the b parameter. Thus, we can view the a VIRTUAL COUNTS and b hyperparameters as virtual counts, in the sense that a prior beta[a, b] behaves exactly as if we had started out with a uniform prior beta[1, 1] and seen a − 1 actual cherry candies and b − 1 actual lime candies. By examining a sequence of beta distributions for increasing values of a and b, keeping the proportions fixed, we can see vividly how the posterior distribution over the parameter Θ changes as data arrive. For example, suppose the actual bag of candy is 75% cherry. Figure 20.5(b) shows the sequence beta[3, 1], beta[6, 2], beta[30, 10]. Clearly, the distribution is converging to a narrow peak around the true value of Θ. For large data sets, then, Bayesian learning (at least in this case) converges to give the same results as maximum-likelihood learning. The network in Figure 20.2(b) has three parameters, θ, θ1, and θ2, where θ1 is the probability of a red wrapper on a cherry candy and θ2 is the probability of a red wrapper on a lime candy. The Bayesian hypothesis prior must cover all three parameters—that is, we need to specify P(Θ, Θ1, Θ2). Usually, we assume parameter independence: PARAMETER INDEPENDENCE P(Θ, Θ1, Θ2) = P(Θ)P(Θ1)P(Θ2) . With this assumption, each parameter can have its own beta distribution that is updated separately as data arrive. Once we have the idea that unknown parameters can be represented by random variables such as Θ, it is natural to incorporate them into the Bayesian network itself. To do this, we also need to make copies of the variables describing each instance. For example, if we have observed three candies then we need Flavor 1, Flavor 2, Flavor 3 and Wrapper1, Wrapper 2, Wrapper3. The parameter variable Θ determines the probability of each Flavor i variable: P(Flavori = cherry|Θ = θ) = θ . Similarly, the wrapper probabilities depend on Θ1 and Θ2, For example, P(Wrapperi = red|Flavor i = cherry, Θ1 = θ1) = θ1 . Now, the entire Bayesian learning process can be formulated as an inference problem in a suitably constructed Bayes net, as shown in Figure 20.6. Prediction for a new instance is done simply by adding new instance variables to the network, some of which are queried. This formulation of learning and prediction makes it clear that Bayesian learning requires no extra “principles of learning.” Furthermore, there is, in essence, just one learning algorithm, i.e., the inference algorithm for Bayesian networks. Learning Bayes net structures So far, we have assumed that the structure of the Bayes net is given and we are just trying to learn the parameters. The structure of the network represents basic causal knowledge about the domain that is often easy for an expert, or even a naive user, to supply. In some cases, however, the causal model may be unavailable or subject to dispute—for example, certain corporations have long claimed that smoking does not cause cancer—so it is important to

<<向上翻页向下翻页>>

点击下载：《Artificial Intelligence：A Modern Approach》教学资源（教材，英文版）Part VI Learning 20 Statistical Learning Methods