重庆大学：《数据仓库与数据挖掘 Data Warehouse and Data mining》课程PPT教学课件（英文版）Chapter 6 Advanced Frequent Pattern Mining

◼ Pattern Mining: A Road Map ◼ Pattern Mining in Multi-Level, Multi-Dimensional Space ◼ Constraint-Based Frequent Pattern Mining ◼ Mining High-Dimensional Data and Colossal Patterns ◼ Mining Compressed or Approximate Patterns ◼ Pattern Exploration and Application ◼ Summary

团购合买资源类别：文库，文档格式：PPT，文档页数：64，文件大小：2.25MB

Chapter 6: Advanced Frequent Pattern Mining Pattern Mining: A road Map Pattern Mining in Multi-Level, multi-Dimensional space Constraint-Based Frequent Pattern Mining Mining High-Dimensional Data and Colossal Patterns Mining Compressed or Approximate Patterns Pattern Exploration and application ■ Summary

1 Chapter 6 : Advanced Frequent Pattern Mining ◼ Pattern Mining: A Road Map ◼ Pattern Mining in Multi-Level, Multi-Dimensional Space ◼ Constraint-Based Frequent Pattern Mining ◼ Mining High-Dimensional Data and Colossal Patterns ◼ Mining Compressed or Approximate Patterns ◼ Pattern Exploration and Application ◼ Summary

frequent patterm Basic Pattens association rule closed/max patten ■ generator Kinds of Multilevel a multilevel(uniform, varied, or itemset -based support) patterns Multidimensional a multidimensional pattern( incl high-dimensional patten) and rules Pattems a continuous data( discretization -based, or statistical) ■ approximate pattem ■ uncertain pattem Extended Patterns ■ compressed patten a rare pattern/negative pattem iEEE a high-dimensional and colossal pattens a candidate generation( Apriori, partitioning, sampling, - Basic Mining Pattem growth( FPgrowth, HMine, FPMax, Closet+, -. Methods 555a55 a vertical format( EClat, CHARM,. a interestingness(subjective vs objective) Mining Methods Mining Interesting ■ constraint-based mining Patterns correlation rules ■ exception rules Distributed parallel a distributed/parallel mining incremental incremental mining stream pattern sequential ad time-series patterns structural(e. g, tree, lattice, graph)pattens Extended Data spatial(e. g, Co-location) pattern Types temporal(evolutionary, perodic) o image, video and multimedia pattems Extensions ■ network pattems Application g pattem-based classification pattem-based clustering Applications pattem-based semantic annotation collaborative flitering ■ pnvacy-preserving

Research on Pattern Mining: A Road Map 2

Pattern Mining in Multi-Level, multi- Dimensional Space Mining multi-Level Association Mining multi-Dimensional association Mining Quantitative Association Rules Mining rare patterns and Negative patterns

3 Pattern Mining in Multi-Level, MultiDimensional Space ◼ Mining Multi-Level Association ◼ Mining Multi-Dimensional Association ◼ Mining Quantitative Association Rules ◼ Mining Rare Patterns and Negative Patterns

Computer Software Printer and Camera Computer Accessory Laptop Digital Desktop Office AntivirusPrinter Wrist Pad Mouse Camera IBM Dell Mirosot Canon…| Fellowes……| LogiTech ////八 Figure 7.2 Concept hierarchy for AllElectronics computer items

Figure 7.2 Concept hierarchy for AllElectronics computer items

Level 1 min_ sup=5% computer support=10%] Level 2 min_ sup=5% laptop computetsuppot=6%] desktop computer [support=4%] Figure 7.3 Multilevel mining with uniform support

Figure 7.3 Multilevel mining with uniform support

Level 1 min sup=5% computer(support= 10%] Level 2 min sup=3% laptop computetsupport =6%] desktop computer [suppot=4%] Figure 7. 4 Multilevel mining with reduced support

Figure 7.4 Multilevel mining with reduced support

Mining Multiple-Level Association Rules Items often form hierarchies Flexible support settings Items at the lower level are expected to have lower support Exploration of shared multi-level mining(agrawal Srikant@VLB95, Han Fu@VLDB95) uniform support reduced support Level l Milk min sup =5% Level 1 Support=10%1 min sup=5% Level 2 Milk Skim milk Level2 min_sup=5% [support=6%1: [support=4%1 min sup =3% 7

7 Mining Multiple-Level Association Rules ◼ Items often form hierarchies ◼ Flexible support settings ◼ Items at the lower level are expected to have lower support ◼ Exploration of shared multi-level mining (Agrawal & Srikant@VLB’95, Han & Fu@VLDB’95) uniform support Milk [support = 10%] 2% Milk [support = 6%] Skim Milk [support = 4%] Level 1 min_sup = 5% Level 2 min_sup = 5% Level 1 min_sup = 5% Level 2 min_sup = 3% reduced support

Multi-level Association: Flexible Support and Redundancy filtering Flexible min-support thresholds: Some items are more valuable but less frequent Use non-uniform, group-based min-support E.g. diamond watch, camera]: 0. 05% bread milk 5%/ Redundancy filtering Some rules may be redundant due to ancestor"relationships between items milk= wheat bread [support=8%, confidence= 70%] 2 milk wheat bread [support= 2%, confidence = 72%] The first rule is an ancestor of the second rule a rule is redundant if its support is close to the expected"value based on the rule's ancestor

8 Multi-level Association: Flexible Support and Redundancy filtering ◼ Flexible min-support thresholds: Some items are more valuable but less frequent ◼ Use non-uniform, group-based min-support ◼ E.g., {diamond, watch, camera}: 0.05%; {bread, milk}: 5%; … ◼ Redundancy Filtering: Some rules may be redundant due to “ancestor” relationships between items ◼ milk  wheat bread [support = 8%, confidence = 70%] ◼ 2% milk  wheat bread [support = 2%, confidence = 72%] The first rule is an ancestor of the second rule ◼ A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor

Mining Multi-Dimensional Association Single-dimensional rules buys(X,"milk)= buys(X,"bread) Multi-dimensional rules:22 dimensions or predicates Inter-dimension assoc rules(no repeated predicates) age(X, 19-25)A occupation(X, student)= buys(X,"coke hybrid-dimension assoc rules(repeated predicates) age(X, 19-25)A buys(X, popcorn)= buys(X,coke") Categorical Attributes: finite number of possible values,no ordering among values--data cube approach Quantitative Attributes: Numeric, implicit ordering among valuesdiscretization, clustering and gradient approaches

9 Mining Multi-Dimensional Association ◼ Single-dimensional rules: buys(X, “milk”)  buys(X, “bread”) ◼ Multi-dimensional rules:  2 dimensions or predicates ◼ Inter-dimension assoc. rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X, “coke”) ◼ hybrid-dimension assoc. rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”) ◼ Categorical Attributes: finite number of possible values, no ordering among values—data cube approach ◼ Quantitative Attributes: Numeric, implicit ordering among values—discretization, clustering, and gradient approaches

Mining Quantitative Associations Techniques can be categorized by how numerical attributes such as age or salary are treated 1. Static discretization based on predefined concept hierarchies(data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules eg Agrawal srikant@SIGMOD96 3. Clustering: Distance-based association(e.g. Yang Miller@SIGMOD97 One dimensional clustering then association 4. Deviation:(such as Aumann and Lindell@KDD99) Sex= female = Wage: mean=$7/hr(overall mean= $9)

10 Mining Quantitative Associations Techniques can be categorized by how numerical attributes, such as age or salary are treated 1. Static discretization based on predefined concept hierarchies (data cube methods) 2. Dynamic discretization based on data distribution (quantitative rules, e.g., Agrawal & Srikant@SIGMOD96) 3. Clustering: Distance-based association (e.g., Yang & Miller@SIGMOD97) ◼ One dimensional clustering then association 4. Deviation: (such as Aumann and Lindell@KDD99) Sex = female => Wage: mean=$7/hr (overall mean = $9)

点击进入文档下载页（PPT格式）

共64页，可试读20页，点击继续阅读 ↓↓

点击下载（PPT格式）

浏览记录