当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

香港理工大学:数据仓库和数据挖掘(PPT讲稿)Data Warehousing & Data Mining

资源类别:文库,文档格式:PPT,文档页数:47,文件大小:7.14MB,团购合买
• Data Mining – From data warehousing to data mining. – Data pre-processing and data mining life-cycle. – Association and sequence analysis; classification and clustering. – Fuzzy Logic, Neural Networks, and Genetic Algorithms. – Mining Complex Data. • OLAP mining; spatial data mining; text mining; time-series data mining; web mining; visual data mining. • Data warehousing. – Introduction; basic concepts of data warehousing; data warehouse vs. Operational DB; data warehouse and the industry. – Architecture and design; two-tier and three￾tier architecture; star schema and snowflake schema; data capturing, replication, transformation and cleansing. – Data characteristics; metadata; static and dynamic data; derived data. – Data Marts; OLAP; data mining; data warehouse administration.
点击下载完整版文档(PPT)

COMP 578 Data Warehousing Data Mining Keith C.C. han Department of Computing The Hong Kong Polytechnic University

COMP 578 Data Warehousing & Data Mining Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University

Text and references Chan, K.C. C, Course Notes on Data Mining Data Warehousing, Department of Computing The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, 2003 Inmon, W.H., Building the Data Warehouse, 2nd Edition, J. Wiley sons, New York, NY, 1996 Whitehorn, M, Business Intelligence: the IBM Solution: Datawarehousing and OLAP Springer, London, 1999. Han, J, and Kamber, M. Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA, 2001 O P. Rud, Data Mining Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management, J. Wiley, New York, NY, 2001 Groth, R, Data Mining: Building Competitive Advantage, Prentice Hall, Upper Saddle River, NJ,1998 Kovalerchuk, B, Data Mining in Finance: Advances in Relational and Hybrid Methods, Kluwer Academic, Boston 2000 Berry, MJ.A, Mastering Data Mining: the Art and Science of Customer Relationship Management, Wilery, New York NY, 2000 Berry, M.J. A Data Mining Techniques for Marketing, Sales and Customer Support, Wilery New York NY, 1997 Mattison, R, Data Warehousing and Data Mining for Telecommunications, Artech House Boston, 1997

5 Text and References • Chan, K.C.C., Course Notes on Data Mining & Data Warehousing, Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, 2003. • Inmon, W.H., Building the Data Warehouse, 2 nd Edition, J. Wliley & Sons, New York, NY, 1996. • Whitehorn, M., Business Intelligence: the IBM Solution: Datawarehousing and OLAP, Springer, London, 1999. • Han, J., and Kamber, M., Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, CA, 2001. • O.P. Rud, Data Mining Cookbook: Modeling Data for Marketing, Risk, and Customer Relationship Management, J. Wiley, New York, NY, 2001. • Groth, R., Data Mining: Building Competitive Advantage, Prentice Hall, Upper Saddle River, NJ, 1998. • Kovalerchuk, B., Data Mining in Finance: Advances in Relational and Hybrid Methods, Kluwer Academic, Boston, 2000. • Berry, M.J.A., Mastering Data Mining: the Art and Science of Customer Relationship Management, Wilery, New York NY, 2000. • Berry, M.J.A., Data Mining Techniques for Marketing, Sales and Customer Support, Wilery, New York NY, 1997. • Mattison, R., Data Warehousing and Data Mining for Telecommunications, Artech House, Boston, 1997

Course Outline (1) Data Mining From data warehousing to data mining Data pre-processing and data mining life-cycle Association and sequence analysis classification and clustering Fuzzy Logic, Neural Networks, and Genetic Algorithms Mining Complex Data OLAP mining; spatial data mining; text mining time-series data mining; web mining; visual data mining

6 Course Outline (1) • Data Mining – From data warehousing to data mining. – Data pre-processing and data mining life-cycle. – Association and sequence analysis; classification and clustering. – Fuzzy Logic, Neural Networks, and Genetic Algorithms. – Mining Complex Data. • OLAP mining; spatial data mining; text mining; time-series data mining; web mining; visual data mining

Course Outline(2) ° Data warehousing Introduction; basic concepts of data warehousing; data warehouse VS. Operational DB, data warehouse and the industry Architecture and design; two-tier and three tier architecture, star schema and snowflake schema, data capturing, replication, transformation and cleansing Data characteristics metadata static and dynamic data; derived data Data Marts; OLAP, data mining, data Warehouse administration

7 Course Outline (2) • Data warehousing. – Introduction; basic concepts of data warehousing; data warehouse vs. Operational DB; data warehouse and the industry. – Architecture and design; two-tier and three￾tier architecture; star schema and snowflake schema; data capturing, replication, transformation and cleansing. – Data characteristics; metadata; static and dynamic data; derived data. – Data Marts; OLAP; data mining; data warehouse administration

Aims and objectives The hype about data姗器版 CUSTOMER REL ATIONEHIF MANAGEMENT warehousing and Analytics and the Data Warehouse data mining o Better understand tools by IBM, IT solutions meet Microsoft oracle marketers goals SAS, SPSS Job mobility and prospects. Projects and research thesis

8 Aims and Objectives • The hype about data warehousing and data mining. • Better understand tools by IBM, Microsoft, Oracle, SAS, SPSS. • Job mobility and prospects. • Projects and research thesis

Data Warehousing and Industry One of the hottest topic in IS Over 90% of larger companies either have a DW or are starting one Warehousing is big business $2 billion in 1995 $3.5 billion in early 1997 $8 billion in 1998 [Metagroupl over $200 billion over next 5 years

9 Data Warehousing and Industry • One of the hottest topic in IS. • Over 90% of larger companies either have a DW or are starting one. • Warehousing is big business – $2 billion in 1995 – $3.5 billion in early 1997 – $8 billion in 1998 [Metagroup] – over $200 billion over next 5 years

Data Warehousing and Industry(2) A 1996 study of 62 data warehousing projects showed An average return on investment of 321% with an average payback period of 2.73 years WalMart has largest warehouse 900-CPU, 2,700 disk, 23 TB Teradata system NTTB in warehouse 40-50GB per day 10

10 Data Warehousing and Industry (2) • A 1996 study of 62 data warehousing projects showed: – An average return on investment of 321%, with an average payback period of 2.73 years. • WalMart has largest warehouse – 900-CPU, 2,700 disk, 23 TB Teradata system – ~7TB in warehouse – 40-50GB per day

What is a data Warehouse? Defined in many different ways non-rigorously A DB for decision support Maintained separately from an organizations operational database a data warehouse is a subjiect-oriented integrated time-variant, and nonvolatile collection of data in support of management's decision-making process.-- W.H. Inmon o Data warehousing The process of constructing and using data warehouses

11 What is a Data Warehouse? • Defined in many different ways non-rigorously. – A DB for decision support. – Maintained separately from an organization’s operational database. • A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.— W. H. Inmon • Data warehousing: – The process of constructing and using data warehouses

Why Data Warehousing? Advance of information technology Data collected in huge amounts Need to make good use of data? Architecture and tools to Bring together scattered information from multiple sources to provide consistent data source for decision support. Support information processing by providing a solid platform of consolidated, historical data for analysis

12 Why Data Warehousing? • Advance of information technology. • Data collected in huge amounts. • Need to make good use of data? • Architecture and tools to – Bring together scattered information from multiple sources to provide consistent data source for decision support. – Support information processing by providing a solid platform of consolidated, historical data for analysis

Why Data Mining? Data explosion problem Automated data collection tools and mature database technology Leading to tremendous amounts of data stored in databases, data warehouses and other information repositories o We are drowning in data, but starving for knowledge

13 Why Data Mining? • Data explosion problem: – Automated data collection tools and mature database technology. – Leading to tremendous amounts of data stored in databases, data warehouses and other information repositories. • We are drowning in data, but starving for knowledge!

点击下载完整版文档(PPT)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共47页,可试读16页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有