当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

上海交通大学:Mining Massive Datasets(PPT讲稿)

资源类别:文库,文档格式:PPT,文档页数:66,文件大小:954.5KB,团购合买
点击下载完整版文档(PPT)

Classification Mining Massive Datasets Wu-Jun li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 9: Supervised Learning --Classification

Classification 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 9: Supervised Learning -- Classification Mining Massive Datasets

Classification Classification Problem Spam filtering: classification task From: Subject: real estate is the only way. gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY There is no need to spend hundreds or even thousands for similar courses I am 22 years old and i have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook Change your life now! 二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二==二二二二=二 Click below to order: http://www.wholesaledaily.com/sales/nmd.htm 二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二二=二二二=二

Classification 2 Spam filtering: classification task From: "" Subject: real estate is the only way... gem oalvgkay Anyone can buy real estate with no money down Stop paying rent TODAY ! There is no need to spend hundreds or even thousands for similar courses I am 22 years old and I have already purchased 6 properties using the methods outlined in this truly INCREDIBLE ebook. Change your life NOW ! ================================================= Click Below to order: http://www.wholesaledaily.com/sales/nmd.htm ================================================= Classification Problem

Classification Classification Problem Supervised Learning ---Classification Given: a description of a point, dEX A fixed set of classes C={c1,C2y…,c} a training set d of labeled points with each labeled document(d,c)∈X×C Determine: a learning method or algorithm which will enable us to learn a classifier f: X>C For a test point d, we assign it the class f(dec

Classification 3 Supervised Learning --- Classification ▪ Given: ▪ A description of a point, d  X ▪ A fixed set of classes: C = {c1 , c2 ,…, cJ } ▪ A training set D of labeled points with each labeled document ⟨d,c⟩∈X×C ▪ Determine: ▪ A learning method or algorithm which will enable us to learn a classifier f:X→C ▪ For a test point d, we assign it the class f(d) ∈ C Classification Problem

Classification Classification Problem Document classification planning anguage Dat proof nelligen (An (Programming) (HCD Classes ML Planning Semantics Garb. Coll.I Multimedia GUI Training learning planning programming garbage Dat intelligence temporal semantics collection algorithm reasoning language memor reinforcement plan root pr optimization network language region Note: in real life there is often a hierarchy, not present in the above problem statement; and also you get papers on ML approaches to Garb. Coll.)

Classification 4 ML Planning Semantics Garb.Coll. Multimedia GUI planning temporal reasoning plan language... programming semantics language proof... learning intelligence algorithm reinforcement network... garbage collection memory optimization region... “planning language proof intelligence” Training Data: Test Data: Classes: (AI) Document Classification (Programming) (HCI) ... ... (Note: in real life there is often a hierarchy, not present in the above problem statement; and also, you get papers on ML approaches to Garb. Coll.) Classification Problem

Classification Classification Problem More classification Examples Many search engine functionalities use classification Assigning labels to documents or web-pages Labels are most often topics such as yahoo- categories finance,"sports," "news>world>asia>business Labels may be genres editorials""movie-reviews"news Labels may be opinion on a person/product like hate "neutral Labels may be domain-specific interesting-to-me": not-interesting-to-me contains adult language: doesn't language identification: English, French, Chinese, search vertical: about linux versus not a ink spam": "not link spam

Classification 5 More Classification Examples Many search engine functionalities use classification ▪ Assigning labels to documents or web-pages: ▪ Labels are most often topics such as Yahoo-categories ▪ "finance," "sports," "news>world>asia>business" ▪ Labels may be genres ▪ "editorials" "movie-reviews" "news” ▪ Labels may be opinion on a person/product ▪ “like”, “hate”, “neutral” ▪ Labels may be domain-specific ▪ "interesting-to-me" : "not-interesting-to-me” ▪ “contains adult language” : “doesn’t” ▪ language identification: English, French, Chinese, … ▪ search vertical: about Linux versus not ▪ “link spam” : “not link spam” Classification Problem

Classification Classification methods Perceptrons (refer to lecture 9.2 Naive bayes kNN Support vector machine(svm

Classification 6 Classification Methods ▪ Perceptrons (refer to lecture 9.2) ▪ Naïve Bayes ▪ kNN ▪ Support vector machine (SVM)

Classification Naive Bayes Bayesian Methods Learning and classification methods based on probability theory Bayes theorem plays a critical role in probabilistic learning and classification Builds a generative model that approximates how data is produced Uses prior probability of each category given no information about an item Categorization produces a posterior probability distribution over the possible categories given a description of an item

Classification 7 Bayesian Methods ▪ Learning and classification methods based on probability theory. ▪ Bayes theorem plays a critical role in probabilistic learning and classification. ▪ Builds a generative model that approximates how data is produced ▪ Uses prior probability of each category given no information about an item. ▪ Categorization produces a posterior probability distribution over the possible categories given a description of an item. Naïve Bayes

Classification Naive Bayes Bayes Rule for classification or a point d and a class c P(c,d)=P(cldp(a=P(dCPc P(cd) P(ac)p(c) P(d)

Classification 8 Bayes’ Rule for classification ▪ For a point d and a class c  P(c,d) = P(c | d)P(d) = P(d | c)P(c)  P(c | d) = P(d | c)P(c) P(d) Naïve Bayes

Classification Naive Bayes Naive Bayes classifiers Task: Classify a new point d based on a tuple of attribute values into one of the classes c∈C XI CMAP =argmax P(c,Ix,,x,,.,xn) C;∈ P(x12x2,…,xnc,)P(C,) argmaX C argmax P(X,X MAPis“ maximum a posteriori”= most likely class

Classification 9 Naive Bayes Classifiers Task: Classify a new point d based on a tuple of attribute values into one of the classes cj  C d = x1 , x2 ,, xn  argmax ( | , , , ) j 1 2 n c C MAP c P c x x x j   = ( , , , ) ( , , , | ) ( ) argmax 1 2 1 2 n n j j c C P x x x P x x x c P c j    = argmax ( , , , | ) ( ) 1 2 n j j c C P x x x c P c j   = MAP is “maximum a posteriori” = most likely class Naïve Bayes

Classification Naive Bayes Naive Bayes Classifier Naive bayes assumption P() Can be estimated from the frequency of classes in the training examples P O(X/n. C))parameters Could only be estimated if a very very large number of training examples was available Naive bayes Conditional Independence Assumption assume that the probability of observing the conjunction of attributes is equal to the product of the individual probabilities P(x c)

Classification 10 Naïve Bayes Classifier: Naïve Bayes Assumption ▪ P(cj ) ▪ Can be estimated from the frequency of classes in the training examples. ▪ P(x1 ,x2 ,…,xn |cj ) ▪ O(|X|n•|C|) parameters ▪ Could only be estimated if a very, very large number of training examples was available. Naïve Bayes Conditional Independence Assumption: ▪ Assume that the probability of observing the conjunction of attributes is equal to the product of the individual probabilities P(xi|cj ). Naïve Bayes

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共66页,可试读20页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有