Big Data Analysis and Mining Association Rule Qinpei Zhao赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall 2021/1/27
2021/1/27 1 Big Data Analysis and Mining Qinpei Zhao 赵钦佩 qinpeizhao@tongji.edu.cn 2015 Fall Association Rule
Frequent Pattern Analysis Frequent patten a pattern(a set of itemS,S subsequences, substructures, etc. )that occur frequently in a data set a First proposed by Agrawal, Imielinski, and Swami in the context of frequent itemsets and assocIation rule mining a Motivation Finding inherent regularities in data o What products were often purchased together? 口 Beer and diapers? What are the subsequent purchases after buying a PC?
Frequent Pattern Analysis 2 ◼ Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set ◼ First proposed by Agrawal, Imielinski, and Swami in the context of frequent itemsets and association rule mining. ◼ Motivation: Finding inherent regularities in data ◆ What products were often purchased together? Beer and diapers? ◆ What are the subsequent purchases after buying a PC?
Association Rule Discovery a Supermarket shelf management- Market-basket model ■Goal o Identify items that are bought together by sufficiently many customers ■ Approach e Process the sales data collected with barcode scanners to find dependencies among items ■ a classic ru|e If someone buys diaper and milk, then he/she is likely to buy beer Don't be surprised if you find six-packs next to diapers!
Association Rule Discovery 3 ◼ Supermarket shelf management – Market-basket model ◼ Goal: ◆ Identify items that are bought together by sufficiently many customers ◼ Approach: ◆ Process the sales data collected with barcode scanners to find dependencies among items ◼ A classic rule: ◆ If someone buys diaper and milk, then he/she is likely to buy beer ◆ Don’t be surprised if you find six-packs next to diapers!
Applications-(1) a Items= products Baskets = sets of products someone bought in one trip to the store a Real market baskets: Chain stores keep TBs of data about what customers buy together o Tells how typical customers navigate stores, let them position tempting items Suggests tie-in tricks", e.g., run sale on diapers and raise the price of beer Need the rule to occur frequently a Amazon's people who bought X also bought Y
Applications – (1) 4 ◼ Items = products; Baskets = sets of products someone bought in one trip to the store ◼ Real market baskets: Chain stores keep TBs of data about what customers buy together ◆ Tells how typical customers navigate stores, let them position tempting items ◆ Suggests tie-in “tricks”, e.g., run sale on diapers and raise the price of beer ◆ Need the rule to occur frequently ◼ Amazon’s people who bought X also bought Y
ELSES PROPERTY PLAGIARISm ORK Applications-(2) Baskets sentences Items documents containing those sentences Items that appear together too often could represent plagiarism ◆ Notice items do not have to be“in” baskets Baskets= patients; Items drugs side-effects o has been used to detect combinations of drugs that result in particular side-effects But requires extension: Absence of an item needs to be observed as well as presence
Applications – (2) 5 ◼ Baskets = sentences; Items = documents containing those sentences ◆ Items that appear together too often could represent plagiarism ◆ Notice items do not have to be “in” baskets ◼ Baskets = patients; Items = drugs & side-effects ◆ Has been used to detect combinations of drugs that result in particular side-effects ◆ But requires extension: Absence of an item needs to be observed as well as presence
ELSES PROPERTY ARISm ORE IO Plagiarism Checker Similar Content search Online Plagiarism Paste Original content Here Paste Alternate content Here esting Plagiarism ABC College for Women is one of the most prestigious since the establishment of ABC College for Women and in early Januar institutions of London with a full time enrollment of about 8000 students 12, the University has tried its level best for improvement in Highes Government did variou institution have been shaped by its institutional history, which is spread foreign universities MoU with various national industries and linkages with t years. In 02, the University made all strong foreign universities have been established in the field of Pharmacy sions for the improvement in on. Established in May Electronics, Ent al Science, Fine Arts, Economics and Mass Communication. This is how they made the glorious academic values of University of the Oxford, it was housed in a building on XYZ Road,with his oldest premier post-graduate female institution very nicely ngth of 90 students and then the progress flourished with full shot And College started programs like Electronics, Environmental Science, ts, Economics and Mass Communication, Various national industries and linkages with Foreign Colleges helped a lot c842W:130s:6P:1 C:558W805:3P1 lear Highlight Clear all Import ar Highlight Clear all Occurances Density Matching Limit Case Sensitive Higher Education. Established 1% Scan Density 2.31% Statistics: Plagiarism ABC College 2.31% 30 %/o Duplicate Found! Education, Established in 231% a Export csy 39 Matches Detected Established in May Export相ML for Women s 231% Scan Now 6
6
Transaction data: a set of documents A text document data set, each document is treated as a“bag” of keywords doc 1 Student, Teach, School doc2 Student School docs Teach, School, City, Game doc Baseball, Basketball doc5 Basketball, Player, Spectator doc Baseball. Coach. Game, Team doc: Basketball, Team, City, Game
7 Transaction data: a set of documents ◼ A text document data set. Each document is treated as a “bag” of keywords doc1: Student, Teach, School doc2: Student, School doc3: Teach, School, City, Game doc4: Baseball, Basketball doc5: Basketball, Player, Spectator doc6: Baseball, Coach, Game, Team doc7: Basketball, Team, City, Game
The model: rules a transaction t contains x. a set of items (itemset)in / ifX c t An association rule is an implication of the form X→>Y, Where x,Ycl,andX∩Y= An itemset is a set of items + E.g., X=milk, bread, cereal) is an itemset ak-itemset is an itemset with k items E.g., milk, bread, cereal] is a 3-itemset
8 The model: rules ◼ A transaction t contains X, a set of items (itemset) in I, if X t. ◼ An association rule is an implication of the form: X → Y, where X, Y I, and X Y = ◼ An itemset is a set of items. ◆ E.g., X = {milk, bread, cereal} is an itemset. ◼ A k-itemset is an itemset with k items. ◆ E.g., {milk, bread, cereal} is a 3-itemset
Rule strength measures (the transaction data set) if sup gor sup in T Support: The rule holds with support transactions containⅩ∪Y ◆Sp=Pr(x∪Y Confidence. The rule holds in t with confidence conf if conf of tranactions that contain x also contain y conf=Pr(r X) a An association rule is a pattern that states When x occurs. y occurs with certain probability
9 Rule strength measures ◼ Support: The rule holds with support sup in T (the transaction data set) if sup % of transactions contain X Y. ◆ sup = Pr(X Y). ◼ Confidence: The rule holds in T with confidence conf if conf % of tranactions that contain X also contain Y. ◆ conf = Pr(Y | X) ◼ An association rule is a pattern that states when X occurs, Y occurs with certain probability
Support and Confidence Support count: The support count of an itemset X, denoted by X count, in a data set T is the number of transactions in t that contain X assume t has n transactions Then (X∪Y) count support= (X∪) count confidence Xcount
10 Support and Confidence ◼ Support count: The support count of an itemset X, denoted by X.count, in a data set T is the number of transactions in T that contain X. Assume T has n transactions. ◼ Then, n X Y count support ( ). = X count X Y count confidence . ( ). =