当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

香港理工大学:Discovering Classification Rules

资源类别:文库,文档格式:PPT,文档页数:64,文件大小:422KB,团购合买
点击下载完整版文档(PPT)

COMP 578 Discovering Classification Rules Keith c.c. chan Department of Computing The Hong Kong Polytechnic University

COMP 578 Discovering Classification Rules Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University

An Example Classification Problem Patient records Recovered Symptoms Treatment Not recovered A B?

2 An Example Classification Problem Patient Records Symptoms & Treatment Recovered Not Recovered A? B?

Classification in Relational DB Patient Symptom Treatment/Recovered Mike Headache TypeAYes Mary Fever typeANo Bill Cough Type B2 No Fever Type C1Yes Dave Doug h Type C1Yes Anne Headache Type B2 Yes Will John, having a headache Class Label and treated with Type CI recover?

3 Classification in Relational DB Patient Symptom TreatmentRecovered Mike Headache Type A Yes Mary Fever Type A No Bill Cough Type B2 No Jim Fever Type C1 Yes Dave Cough Type C1 Yes Anne Headache Type B2 Yes Class Label Will John, having a headache and treated with Type C1, recover?

Discovering of Classification Rules Minins Classification Training Rules Data NAME Symptom Treat. Recover? Mike Headache Type AYes Mary Fever Type ANo Classification Cough Type B2No Rules Jim Fever ype C1 Yes Dave Cough Typ ype C1 YesIF Symptom=Headache Anne Headache Type B2 Yes AND Treatment=Cl Then Recover Yes Based on the classification rule discovered. John will recover

4 Discovering of Classification Rules Training Data NAME Symptom Treat. Recover? Mike Headache Type A Yes Mary Fever Type A No Bill Cough Type B2 No Jim Fever Type C1 Yes Dave Cough Type C1 Yes Anne Headache Type B2 Yes Mining Classification Rules IF Symptom = Headache AND Treatment = C1 THEN Recover = Yes Classification Rules Based on the classification rule discovered, John will recover!!!

The classification problem a Given a database consisting of n records Each record characterized by m attributes Each record pre-classified into p different classes Find A set of classification rules(that constitutes a classification model) that characterizes the different classes so that records not originally in the database can be accurately classified I. e predicting"class labels

5 The Classification Problem Given: – A database consisting of n records. – Each record characterized by m attributes. – Each record pre-classified into p different classes. Find: – A set of classification rules (that constitutes a classification model) that characterizes the different classes – so that records not originally in the database can be accurately classified. – I.e “predicting” class labels

Typical Applications Credit approval Classes can be high risk Low risk? 罐 Target marketing What are the classes? Medical diagnosis Classes can be customers with different diseases w Treatment effectiveness analysIs Classes can be patience with different degrees of recovery

6 Typical Applications Credit approval. – Classes can be High Risk, Low Risk? Target marketing. – What are the classes? Medical diagnosis – Classes can be customers with different diseases. Treatment effectiveness analysis. – Classes can be patience with different degrees of recovery

Techniques for Discoveirng of Classification Rules s The k-Nearest Neighbor Algorithm s The linear discriminant function s The Bayesian Approach s The decision tree approach The Neural Network approach s The genetic algorithm approach

7 Techniques for Discoveirng of Classification Rules The k-Nearest Neighbor Algorithm. The Linear Discriminant Function. The Bayesian Approach. The Decision Tree approach. The Neural Network approach. The Genetic Algorithm approach

Example Using The K-NN Algorithm Salary Age Insurance 15K 28 Bt 31K 39 Buy 41K 53 Buy 10K 45 Buy 14K 55 Bu 25K 27 Not buy 42K 32 Not Buy 18K 38 Not bur 33K 44 Not Buy John earns 24K per month and is 42 years old Will he buy insurance?

8 Example Using The k-NN Algorithm Salary Age Insurance 15K 28 Buy 31K 39 Buy 41K 53 Buy 10K 45 Buy 14K 55 Buy 25K 27 Not Buy 42K 32 Not Buy 18K 38 Not Buy 33K 44 Not Buy John earns 24K per month and is 42 years old. Will he buy insurance?

The k-Nearest Neighbor Algorithm All data records correspond to points in the n Dimensional space Nearest neighbor defined in terms of Euclidean distance s k-nn returns the most common class label among k training examples nearest to xq

9 The k-Nearest Neighbor Algorithm All data records correspond to points in the n￾Dimensional space. Nearest neighbor defined in terms of Euclidean distance. k-NN returns the most common class label among k training examples nearest to xq. . _ + _ xq + _ _ + _ _ +

The K-NN Algorithm(2) N k-nn can be for continuous-valued labels Calculate the mean values of the k nearest neighbors w Distance-weighted nearest neighbor algorithm Weight the contribution of each of the k neighbors according to their distance to the query point x a Advantage X Robust to noisy data by averaging k-nearest neighbors 罐 Disadvantage Distance between neighbors could be dominated by irrelevant attributes 10

10 The k-NN Algorithm (2) k-NN can be for continuous-valued labels. – Calculate the mean values of the k nearest neighbors Distance-weighted nearest neighbor algorithm – Weight the contribution of each of the k neighbors according to their distance to the query point xq Advantage: – Robust to noisy data by averaging k-nearest neighbors Disadvantage: – Distance between neighbors could be dominated by irrelevant attributes. w d x q x i  1 2 ( , )

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共64页,可试读20页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有