I Data Warehouse and data mInIng Yufang Zhang Department of Computer Science Chongqing University Zhangyf@cqu.edu.cn
1 Data Warehouse and Data mining Yufang Zhang Department of Computer Science, Chongqing University zhangyf@cqu.edu.cn
Course structures ◆ coverage Database, i Data mining Intro, to data warehousing and mining Data mining: Principles and algorithms Independent study: only if you seriously plan to do your Ph. D /M.S. on data mining and try to demonstrate your ability text information systems, Web and bioinformatics
2 Course Structures Coverage: ◼ Database, ◼ Data mining Intro. to data warehousing and mining Data mining: Principles and algorithms Independent Study: only if you seriously plan to do your Ph.D./M.S. on data mining and try to demonstrate your ability ◼ text information systems, ◼ Web and bioinformatics
OBJECTIVE/DESCRIPTION o The course will introduce concepts and techniques of data mining and data warehousing, including concept principle architecture design implementation application of data warehousing and data mining 2 Some systems for data warehousing and/or data mining will also be introduced
3 OBJECTIVE/DESCRIPTION The course will introduce concepts and techniques of data mining and data warehousing, including ◼ concept ◼ principle ◼ architecture ◼ design ◼ implementation ◼ application of data warehousing and data mining. Some systems for data warehousing and/or data mining will also be introduced
Reference books Data Mining: Concepts and Techniques(3rd edition) Jiawei Han Micheline Kamber and Jian pei China machine press o Data Mining: Concepts and Techniques (2nd edition Jiawei Han and Micheline Kamber China machine press Data Mining: Concepts and Techniques a Jiawei Han and micheline Kamber Higher Education Press o Introduction to Data Mining Tan steinbach Kumar ■ Turing e Data mining: Introductory and Advanced Topics Margaret H. dunham 清华大学出版社
4 Reference Books Data Mining: Concepts and Techniques (3rd edition) ◼ Jiawei Han, Micheline Kamber and Jian Pei ◼ China Machine Press Data Mining: Concepts and Techniques (2nd edition) ◼ Jiawei Han and Micheline Kamber ◼ China Machine Press Data Mining: Concepts and Techniques ◼ Jiawei Han and Micheline Kamber ◼ Higher Education Press Introduction to Data Mining ◼ Tan, Steinbach, Kumar ◼ Turing Data mining: Introductory and Advanced Topics ◼ Margaret H. Dunham ◼ 清华大学出版社
Acknowledgements These teaching slides are cited from the text book and searched from internet Some cases are cited from blog of Professor Tang Changjie
5 Acknowledgements These teaching slides are cited from the text book and searched from Internet Some cases are cited from blog of Professor Tang Changjie
Coverage(Chapters 1-10 3rd Ed. Coverage(BK2: 2nd Ed ◆ Coverage(BK3:3ed.) 1. Introduction Introduction 2. Getting to Know Your Data Data Preprocessing 3. Data Preprocessing 4. Data Warehouse and oLAP Technology: An Data Warehouse and olap Introduction Technology: An Introduction 5. Advanced Data Cube Technology 6. Mining Frequent Patterns Association Advanced Data Cube Technology Basic concepts and Data generalization 7. Mining Frequent Patterns association Advanced methoe Mining Frequent Patterns 8. Classification: Basic Concepts Association and Correlations 9. Classification: Advanced Methods Classification and prediction 10. Cluster Analysis: Basic Concepts 11. Cluster Analysis: Advanced Methods Cluster Analysis 12. Outlier Analysis
6 Coverage (Chapters 1-10, 3rd Ed.) Coverage (BK2: 2nd Ed.) ◼ Introduction ◼ Data Preprocessing ◼ Data Warehouse and OLAP Technology: An Introduction ◼ Advanced Data Cube Technology and Data Generalization ◼ Mining Frequent Patterns, Association and Correlations ◼ Classification and Prediction ◼ Cluster Analysis Coverage (BK3: 3rd ed.) 1. Introduction 2. Getting to Know Your Data 3. Data Preprocessing 4. Data Warehouse and OLAP Technology: An Introduction 5. Advanced Data Cube Technology 6. Mining Frequent Patterns & Association: Basic Concepts 7. Mining Frequent Patterns & Association: Advanced Methods 8. Classification: Basic Concepts 9. Classification: Advanced Methods 10. Cluster Analysis: Basic Concepts 11. Cluster Analysis: Advanced Methods 12. Outlier Analysis
TOPICS introduction o Getting to Know Your Data data preprocessing o data warehousing and oLAP technology: An Introduction o Mining frequent patterns association: Basic Concepts o Classification: Basic Concepts o Cluster analysis: Basic Concepts ◆ Outlier analysis
7 TOPICS introduction Getting to Know Your Data data preprocessing data warehousing and OLAP technology: An Introduction Mining frequent patterns & Association: Basic Concepts Classification: Basic Concepts Cluster analysis: Basic Concepts Outlier analysis
Chapter 1. Introduction o Motivation: Why data mining? ◆ What is data mining? o A Multi-Dimensional View of Data Mining What Kinds of Data Can Be Mined? What Kinds of Patterns Can Be mined? o What Kinds of Technologies Are Used? o What Kinds of Applications Are Targeted? o Major Issues in Data Mining o A Brief History of Data Mining and data Mining society ◆ Summary
8 Chapter 1. Introduction Motivation: Why data mining? What is data mining? A Multi-Dimensional View of Data Mining What Kinds of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Kinds of Technologies Are Used? What Kinds of Applications Are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society Summary
Mother of Invention心 Motivation: Necessity the o Data explosion problem: from terabytes to petabytes Data collection and data availabi Automated data collection tools, database systems, Web computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks Science: Remote sensing, bioinformatics, scientific simulation, Society and everyone: news, digital cameras, o We are drowning in data, but starving for knowledge!
9 Motivation: “Necessity is the Mother of Invention” Data explosion problem : from terabytes to petabytes ◼ Data collection and data availability Automated data collection tools, database systems, Web, computerized society ◼ Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, We are drowning in data, but starving for knowledge!
Evolution of sciences, new data science era e Before 1600: Empirical science o1600-1950s: Theoretical science Each discipline has grown a theoretical component Theoretical models often motivate experiments and generalize our understanding 1950s-1990S: Computational science Over the last 50 years, most disciplines have grown a third, computationalbranch( e.g. empirical, theoretical, and computational ecology or physics, or linguistics. Computational science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models. 10
10 Evolution of Sciences: New Data Science Era Before 1600: Empirical science 1600-1950s: Theoretical science ◼ Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding. 1950s-1990s: Computational science ◼ Over the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.) ◼ Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models