Frequent Itemsets and Association rules Mining Massive Datasets Wu-Jun li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 3: Frequent Itemsets and Association Rules
Frequent Itemsets and Association Rules 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 3: Frequent Itemsets and Association Rules Mining Massive Datasets
Frequent Itemsets and Association rules Outline ■ Association rules A-Priori algorithm Large-scale algorithms
Frequent Itemsets and Association Rules 2 Outline ▪ Association rules ▪ A-Priori algorithm ▪ Large-scale algorithms 2
Frequent Itemsets and Association rules Association Rules The market-Basket model A large set of items, e.g things sold in a supermarket A large set of baskets, each of which is a small set of the items, e.g the things one customer buys on one day WD Items Bread. Coke. Milk Beer bread Beer, Coke, Diaper, Milk Beer Bread, Diaper. Milk Coke, Diaper. Milk
Frequent Itemsets and Association Rules 3 The Market-Basket Model ▪ A large set of items, e.g., things sold in a supermarket. ▪ A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day. Association Rules
Frequent Itemsets and Association rules Association Rules Market-Baskets-(2) Really a general many-many mapping(association) between two kinds of things But we ask about connections among items not baskets The technology focuses on common events, not rare events(long tail 4
Frequent Itemsets and Association Rules 4 Market-Baskets – (2) ▪ Really a general many-many mapping (association) between two kinds of things. ▪ But we ask about connections among “items,” not “baskets.” ▪ The technology focuses on common events, not rare events (“long tail”). Association Rules
Frequent Itemsets and Association rules Association Rules Association Rule Discovery Goal: To identify items that are bought together by sufficiently many customers and find dependencies among items WDD ltems Bread, Coke, Milk Rules discovered Beer Bread IMilk-->(Coke] Beer, Coke, Diaper, Milk [Diaper, Milk-->(Beer] Beer, Bread, Diaper, Milk 5 Coke, Diaper. Milk
Frequent Itemsets and Association Rules 5 Association Rule Discovery ▪ Goal: To identify items that are bought together by sufficiently many customers, and find dependencies among items Association Rules
Frequent Itemsets and Association rules Association Rules Support Simplest question: find sets of items that appear frequently in the baskets Support for itemset / the number of baskets containing all items in Sometimes given as a percentage Given a support threshold s, sets of items that appear in at least s baskets are called frequent Itemsets
Frequent Itemsets and Association Rules 6 Support ▪ Simplest question: find sets of items that appear “frequently” in the baskets. ▪ Support for itemset I = the number of baskets containing all items in I. ▪ Sometimes given as a percentage. ▪ Given a support threshold s, sets of items that appear in at least s baskets are called frequent itemsets. Association Rules
Frequent Itemsets and Association rules Association Rules Example: Frequent Itemsets Items=milk, coke, pepsi, beer, juice Support threshold =3 baskets B1={m,c,b} 2={m,p,j B3=/m,b] 4={c Am, b, by BIkE, b, jh B元c} Frequent itemsets: infer, (b), [il im, bi b, c, ic,j]
Frequent Itemsets and Association Rules 7 Example: Frequent Itemsets ▪ Items={milk, coke, pepsi, beer, juice}. ▪ Support threshold = 3 baskets. B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j} B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c} ▪ Frequent itemsets: {m}, {c}, {b}, {j}, {m,b}, {b,c}, {c,j}. Association Rules
Frequent Itemsets and Association rules Association Rules Applications -(1) Items products; baskets sets of products someone bought in one trip to the store Example application: given that many people buy beer and diapers together: Run a sale on diapers; raise price of beer. Only useful if many buy diapers beer
Frequent Itemsets and Association Rules 8 Applications – (1) ▪ Items = products; baskets = sets of products someone bought in one trip to the store. ▪ Example application: given that many people buy beer and diapers together: ▪ Run a sale on diapers; raise price of beer. ▪ Only useful if many buy diapers & beer. Association Rules
Frequent Itemsets and Association rules Association Rules Applications -(2) Baskets sentences items documents containing those sentences Items that appear together too often could represent plagiarism Notice items do not have to be in" baskets
Frequent Itemsets and Association Rules 9 Applications – (2) ▪ Baskets = sentences; items = documents containing those sentences. ▪ Items that appear together too often could represent plagiarism. ▪ Notice items do not have to be “in” baskets. Association Rules
Frequent Itemsets and Association rules Association Rules Applications -3) a Baskets= Web pages; items= words Unusual words appearing together in a large number of documents, e. g,Brad and Angelina, may indicate an interesting relationship
Frequent Itemsets and Association Rules 10 Applications – (3) ▪ Baskets = Web pages; items = words. ▪ Unusual words appearing together in a large number of documents, e.g., “Brad” and “Angelina,” may indicate an interesting relationship. Association Rules