当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

《Data Warehousing & Data Mining》课程教学资源(PPT讲稿)Ch 2 Discovering Association Rules

资源类别:文库,文档格式:PPT,文档页数:48,文件大小:1.15MB,团购合买
点击下载完整版文档(PPT)

COMP 578 Data Warehousing data mining Ch 2 Discovering Association Rules Keith C.C. Chan Department of computing The Hong Kong Polytechnic University

Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining

The Ar Mining Problem Given a database of transactions Each transaction being a list of items E.g. purchased by a customer in a visit Find all rules that correlate the presence of one set of items with that of another set of items E. g, 30%of people who buys diapers also uys beer 2

2 The AR Mining Problem ◼ Given a database of transactions. ◼ Each transaction being a list of items. ◼ E.g. purchased by a customer in a visit. ◼ Find all rules that correlate the presence of one set of items with that of another set of items ◼ E.g., 30% of people who buys diapers also buys beer

Motivation applications a If we can find such associations, we will be able to answer 222→beer (What should the company do to boost beer sales?) Diapers→??2 (What other products should the store stocks up?) Attached mailing in direct marketing 3

3 Motivation & Applications (1) ◼ If we can find such associations, we will be able to answer: ◼ ???  beer (What should the company do to boost beer sales?) ◼ Diapers  ??? (What other products should the store stocks up?) ◼ Attached mailing in direct marketing

Motivation applications(2) Originally for marketing to understand purchasing trends What products or services customers tend to purchase at the same time or later on? Use market basket analysis to plan Coupon and discounting Do not offer simultaneous discounts on beer and diapers if they tend to be bought together Discount one to pull in sales of the other Product placement a Place products that have a strong purchasing relationship close together Place such products far apart to increase traffic past other Items

4 ◼ Originally for marketing to understand purchasing trends. ◼ What products or services customers tend to purchase at the same time, or later on? ◼ Use market basket analysis to plan: ◼ Coupon and discounting: ◼ Do not offer simultaneous discounts on beer and diapers if they tend to be bought together. ◼ Discount one to pull in sales of the other. ◼ Product placement. ◼ Place products that have a strong purchasing relationship close together. ◼ Place such products far apart to increase traffic past other items. Motivation & Applications (2)

Measure of Interestingness a For a data mining algorithm to mine for interesting association rules, users have to define a measure of"interestingness a Two popular interestingness measures have been ropose Support and Confidence Lift Ratio(Interest) MineSet from SGI use the terms predictability and prevalence instead of support and confidence

5 Measure of Interestingness ◼ For a data mining algorithm to mine for interesting association rules, users have to define a measure of “interestingness”. ◼ Two popular interestingness measures have been proposed: ◼ Support and Confidence ◼ Lift Ratio (Interest) ◼ MineSet from SGI use the terms predictability and prevalence instead of support and confidence

The Support and Confidence Given rule x&y=>Z Support,S=P(x∪YuZ) where AU B indicates that a transaction contains both X and y (union of item sets X and Y) of tuples containing both a &b/ total of tuples Confidence, C=P(ZXUY) P(Z XU Y) is a conditional probability that a transaction having iXUY also contains of tuples containing both X&y&z /# of tuples containing X&y

6 Given rule X & Y => Z ◼ Support, S = P(X  Y  Z) where A  B indicates that a transaction contains both X and Y (union of item sets X and Y) [# of tuples containing both A & B / total # of tuples] ◼ Confidence, C = P(Z | X  Y ) P(Z | X  Y ) is a conditional probability that a transaction having {XY} also contains Z [# of tuples containing both X&Y&Z / # of tuples containing X&Y] The Support and Confidence

The Support and Confidence Customer Customer buys both Let minimum support 50%, and buys diaper minimum confidence 50%. find out the s and c of 1.A→C 2.C→A Customer buys beer Transaction ID Items Bought 2000 A, B C Answer. 1000 A C A→C(50%,666% 4000 A D 5000 B, E, F C→A(50%,100%) 7

7 The Support and Confidence Transaction ID Items Bought 2000 A,B,C 1000 A,C 4000 A,D 5000 B,E,F Let minimum support 50%, and minimum confidence 50%, find out the S and C of : 1. A  C 2. C  A Customer buys diaper Customer buys both Customer buys beer Answer: A  C (50%, 66.6%) C  A (50%, 100%)

How Good is a Predictive model? Response curves How does the response rate of a targeted selection compare to a random selection? 100% Optimal Selection Response Targeted Selection Rate Random Selection Most likely to respond Least likely

8 How Good is a Predictive Model? Response curves - How does the response rate of a targeted selection compare to a random selection?

What is A Lift Ratio? (1) ■ Consider the rule: When people buy diapers they also buy beer 50 percent of the time a It states an explicit percentage (50% of the time) Consider this other rule People who purchase a vcr are three times more likely to also purchase a camcorder The rule used the comparative phrase three times more likely

9 What is A Lift Ratio? (1) ◼ Consider the rule: ◼ When people buy diapers they also buy beer 50 percent of the time. ◼ It states an explicit percentage (50% of the time). ◼ Consider this other rule: ◼ People who purchase a VCR are three times more likely to also purchase a camcorder. ◼ The rule used the comparative phrase “three times more likely”?

What is a Lift ratio?(2) a The probability is compared to the baseline likelihood The baseline likelihood is the probability of the event occurring independently E. g, if people normally buy beer 5% of the time then the first rule could have said 10 times more likely.” The ratio in this kind of comparison is called lift a key goal of an association rule mining exercise is to find rules that have the desired lift 10

10 ◼ The probability is compared to the baseline likelihood. ◼ The baseline likelihood is the probability of the event occurring independently. ◼ E.g., if people normally buy beer 5% of the time, then the first rule could have said “10 times more likely.” ◼ The ratio in this kind of comparison is called lift. ◼ A key goal of an association rule mining exercise is to find rules that have the desired lift. What is A Lift Ratio? (2)

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共48页,可试读16页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有