Chapter 6: Advanced Frequent Pattern Mining Pattern Mining: A road Map Pattern Mining in Multi-Level, multi-Dimensional space Constraint-Based Frequent Pattern Mining Mining High-Dimensional Data and Colossal Patterns Mining Compressed or Approximate Patterns Pattern Exploration and application ■ Summary
1 Chapter 6 : Advanced Frequent Pattern Mining ◼ Pattern Mining: A Road Map ◼ Pattern Mining in Multi-Level, Multi-Dimensional Space ◼ Constraint-Based Frequent Pattern Mining ◼ Mining High-Dimensional Data and Colossal Patterns ◼ Mining Compressed or Approximate Patterns ◼ Pattern Exploration and Application ◼ Summary
frequent patterm Basic Pattens association rule closed/max patten ■ generator Kinds of Multilevel a multilevel(uniform, varied, or itemset -based support) patterns Multidimensional a multidimensional pattern( incl high-dimensional patten) and rules Pattems a continuous data( discretization -based, or statistical) ■ approximate pattem ■ uncertain pattem Extended Patterns ■ compressed patten a rare pattern/negative pattem iEEE a high-dimensional and colossal pattens a candidate generation( Apriori, partitioning, sampling, - Basic Mining Pattem growth( FPgrowth, HMine, FPMax, Closet+, -. Methods 555a55 a vertical format( EClat, CHARM,. a interestingness(subjective vs objective) Mining Methods Mining Interesting ■ constraint-based mining Patterns correlation rules ■ exception rules Distributed parallel a distributed/parallel mining incremental incremental mining stream pattern sequential ad time-series patterns structural(e. g, tree, lattice, graph)pattens Extended Data spatial(e. g, Co-location) pattern Types temporal(evolutionary, perodic) o image, video and multimedia pattems Extensions ■ network pattems Application g pattem-based classification pattem-based clustering Applications pattem-based semantic annotation collaborative flitering ■ pnvacy-preserving
Research on Pattern Mining: A Road Map 2
Pattern Mining in Multi-Level, multi- Dimensional Space Mining multi-Level Association Mining multi-Dimensional association Mining Quantitative Association Rules Mining rare patterns and Negative patterns
3 Pattern Mining in Multi-Level, MultiDimensional Space ◼ Mining Multi-Level Association ◼ Mining Multi-Dimensional Association ◼ Mining Quantitative Association Rules ◼ Mining Rare Patterns and Negative Patterns
Computer Software Printer and Camera Computer Accessory Laptop Digital Desktop Office AntivirusPrinter Wrist Pad Mouse Camera IBM Dell Mirosot Canon…| Fellowes……| LogiTech ////八 Figure 7.2 Concept hierarchy for AllElectronics computer items
Figure 7.2 Concept hierarchy for AllElectronics computer items
Level 1 min_ sup=5% computer support=10%] Level 2 min_ sup=5% laptop computetsuppot=6%] desktop computer [support=4%] Figure 7.3 Multilevel mining with uniform support
Figure 7.3 Multilevel mining with uniform support
Level 1 min sup=5% computer(support= 10%] Level 2 min sup=3% laptop computetsupport =6%] desktop computer [suppot=4%] Figure 7. 4 Multilevel mining with reduced support
Figure 7.4 Multilevel mining with reduced support
Mining Multiple-Level Association Rules Items often form hierarchies Flexible support settings Items at the lower level are expected to have lower support Exploration of shared multi-level mining(agrawal Srikant@VLB95, Han Fu@VLDB95) uniform support reduced support Level l Milk min sup =5% Level 1 Support=10%1 min sup=5% Level 2 Milk Skim milk Level2 min_sup=5% [support=6%1: [support=4%1 min sup =3% 7
7 Mining Multiple-Level Association Rules ◼ Items often form hierarchies ◼ Flexible support settings ◼ Items at the lower level are expected to have lower support ◼ Exploration of shared multi-level mining (Agrawal & Srikant@VLB’95, Han & Fu@VLDB’95) uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support
Multi-level Association: Flexible Support and Redundancy filtering Flexible min-support thresholds: Some items are more valuable but less frequent Use non-uniform, group-based min-support E.g. diamond watch, camera]: 0. 05% bread milk 5%/ Redundancy filtering Some rules may be redundant due to ancestor"relationships between items milk= wheat bread [support=8%, confidence= 70%] 2 milk wheat bread [support= 2%, confidence = 72%] The first rule is an ancestor of the second rule a rule is redundant if its support is close to the expected"value based on the rule's ancestor
8 Multi-level Association: Flexible Support and Redundancy filtering ◼ Flexible min-support thresholds: Some items are more valuable but less frequent ◼ Use non-uniform, group-based min-support ◼ E.g., {diamond, watch, camera}: 0.05%; {bread, milk}: 5%; … ◼ Redundancy Filtering: Some rules may be redundant due to “ancestor” relationships between items ◼ milk wheat bread [support = 8%, confidence = 70%] ◼ 2% milk wheat bread [support = 2%, confidence = 72%] The first rule is an ancestor of the second rule ◼ A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor
Mining Multi-Dimensional Association Single-dimensional rules buys(X,"milk)= buys(X,"bread) Multi-dimensional rules:22 dimensions or predicates Inter-dimension assoc rules(no repeated predicates) age(X, 19-25)A occupation(X, student)= buys(X,"coke hybrid-dimension assoc rules(repeated predicates) age(X, 19-25)A buys(X, popcorn)= buys(X,coke") Categorical Attributes: finite number of possible values,no ordering among values--data cube approach Quantitative Attributes: Numeric, implicit ordering among valuesdiscretization, clustering and gradient approaches
9 Mining Multi-Dimensional Association ◼ Single-dimensional rules: buys(X, “milk”) buys(X, “bread”) ◼ Multi-dimensional rules: 2 dimensions or predicates ◼ Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”) occupation(X,“student”) buys(X, “coke”) ◼ hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”) buys(X, “popcorn”) buys(X, “coke”) ◼ Categorical Attributes: finite number of possible values, no ordering among values—data cube approach ◼ Quantitative Attributes: Numeric, implicit ordering among values—discretization, clustering, and gradient approaches
Mining Quantitative Associations Techniques can be categorized by how numerical attributes such as age or salary are treated 1. Static discretization based on predefined concept hierarchies(data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules eg Agrawal srikant@SIGMOD96 3. Clustering: Distance-based association(e.g. Yang Miller@SIGMOD97 One dimensional clustering then association 4. Deviation:(such as Aumann and Lindell@KDD99) Sex= female = Wage: mean=$7/hr(overall mean= $9)
10 Mining Quantitative Associations Techniques can be categorized by how numerical attributes, such as age or salary are treated 1. Static discretization based on predefined concept hierarchies (data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules, e.g., Agrawal & Srikant@SIGMOD96) 3. Clustering: Distance-based association (e.g., Yang & Miller@SIGMOD97) ◼ One dimensional clustering then association 4. Deviation: (such as Aumann and Lindell@KDD99) Sex = female => Wage: mean=$7/hr (overall mean = $9)