Chapter 11:Data Analytics ■Overview Data Warehousing Online Analytical Processing ■Data Mining Database System Concepts-7th Edition 11.2 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 11.2 ©Silberschatz, Korth and Sudarshan th Edition Chapter 11: Data Analytics ▪ Overview ▪ Data Warehousing ▪ Online Analytical Processing ▪ Data Mining
Overview Data analytics:the processing of data to infer patterns,correlations,or models for prediction Primarily used to make business decisions Per individual customer E.g.,what product to suggest for purchase ·Across all customers E.g.,what products to manufacture/stock,in what quantity Critical for businesses today Database System Concepts-7th Edition 11.3 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.3 ©Silberschatz, Korth and Sudarshan th Edition Overview ▪ Data analytics: the processing of data to infer patterns, correlations, or models for prediction ▪ Primarily used to make business decisions • Per individual customer ▪ E.g., what product to suggest for purchase • Across all customers ▪ E.g., what products to manufacture/stock, in what quantity ▪ Critical for businesses today
Overview(Cont.) Common steps in data analytics Gather data from multiple sources into one location Data warehouses also integrated data into common schema Data often needs to be extracted from source formats. transformed to common schema,and loaded into the data warehouse Can be done as ETL (extract-transform-load),or ELT (extract- load-transform) Generate aggregates and reports summarizing data Dashboards showing graphical charts/reports Online analytical processing (OLAP)systems allow interactive querying Statistical analysis using tools such as R/SAS/SPSS Including extensions for parallel processing of big data Build predictive models and use the models for decision making Database System Concepts-7th Edition 11.4 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.4 ©Silberschatz, Korth and Sudarshan th Edition Overview (Cont.) ▪ Common steps in data analytics • Gather data from multiple sources into one location ▪ Data warehouses also integrated data into common schema ▪ Data often needs to be extracted from source formats, transformed to common schema, and loaded into the data warehouse • Can be done as ETL (extract-transform-load), or ELT (extractload-transform) • Generate aggregates and reports summarizing data ▪ Dashboards showing graphical charts/reports ▪ Online analytical processing (OLAP) systems allow interactive querying ▪ Statistical analysis using tools such as R/SAS/SPSS • Including extensions for parallel processing of big data • Build predictive models and use the models for decision making
Overview(Cont.) Predictive models are widely used today E.g.,use customer profile features (e.g.income,age,gender, education,employment)and past history of a customer to predict likelihood of default on loan and use prediction to make loan decision E.g.,use past history of sales(by season)to predict future sales And use it to decide what/how much to produce/stock And to target customers Other examples of business decisions: ·Vhat items to stock? What insurance premium to change? To whom to send advertisements? Database System Concepts-7th Edition 11.5 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.5 ©Silberschatz, Korth and Sudarshan th Edition Overview (Cont.) ▪ Predictive models are widely used today • E.g., use customer profile features (e.g. income, age, gender, education, employment) and past history of a customer to predict likelihood of default on loan ▪ and use prediction to make loan decision • E.g., use past history of sales (by season) to predict future sales ▪ And use it to decide what/how much to produce/stock ▪ And to target customers ▪ Other examples of business decisions: • What items to stock? • What insurance premium to change? • To whom to send advertisements?
Overview (Cont.) Machine learning techniques are key to finding patterns in data and making predictions Data mining extends techniques developed by machine-learning communities to run them on very large datasets The term business intelligence(BI)is synonym for data analytics The term decision support focuses on reporting and aggregation Database System Concepts-7th Edition 11.6 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 11.6 ©Silberschatz, Korth and Sudarshan th Edition Overview (Cont.) ▪ Machine learning techniques are key to finding patterns in data and making predictions ▪ Data mining extends techniques developed by machine-learning communities to run them on very large datasets ▪ The term business intelligence (BI) is synonym for data analytics ▪ The term decision support focuses on reporting and aggregation
DATA WAREHOUSING Database System Concepts-7th Edition 11.7 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 11.7 ©Silberschatz, Korth and Sudarshan th Edition DATA WAREHOUSING
Data Warehousing ■ Data sources often store only current data,not historical data Corporate decision making requires a unified view of all organizational data, including historical data A data warehouse is a repository(archive)of information gathered from multiple sources,stored under a unified schema,at a single site Greatly simplifies querying,permits study of historical trends 。 Shifts decision support query load away from transaction processing systems Database System Concepts-7th Edition 11.8 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 11.8 ©Silberschatz, Korth and Sudarshan th Edition Data Warehousing ▪ Data sources often store only current data, not historical data ▪ Corporate decision making requires a unified view of all organizational data, including historical data ▪ A data warehouse is a repository (archive) of information gathered from multiple sources, stored under a unified schema, at a single site • Greatly simplifies querying, permits study of historical trends • Shifts decision support query load away from transaction processing systems
Data Warehousing data source 1 data loaders data source 2 DBMS query and analysis tools data warehouse data source n Database System Concepts-7th Edition 11.9 @Silberschatz,Korth and Sudarshan
Database System Concepts - 7 11.9 ©Silberschatz, Korth and Sudarshan th Edition Data Warehousing
Design Issues When and how to gather data Source driven architecture:data sources transmit new information to warehouse either continuously or periodically (e.g.,at night) Destination driven architecture:warehouse periodically requests new information from data sources Synchronous vs asynchronous replication Keeping warehouse exactly synchronized with data sources(e.g., using two-phase commit)is often too expensive Usually OK to have slightly out-of-date data at warehouse -Data/updates are periodically downloaded form online transaction processing (OLTP)systems. What schema to use ·Schema integration Database System Concepts-7th Edition 11.10 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.10 ©Silberschatz, Korth and Sudarshan th Edition Design Issues ▪ When and how to gather data • Source driven architecture: data sources transmit new information to warehouse ▪ either continuously or periodically (e.g., at night) • Destination driven architecture: warehouse periodically requests new information from data sources • Synchronous vs asynchronous replication ▪ Keeping warehouse exactly synchronized with data sources (e.g., using two-phase commit) is often too expensive ▪ Usually OK to have slightly out-of-date data at warehouse ▪ Data/updates are periodically downloaded form online transaction processing (OLTP) systems. ▪ What schema to use • Schema integration
More Warehouse Design Issues Data transformation and data cleansing E.g.,correctmistakes in addresses(misspellings,zip code errors) Merge address lists from different sources and purge duplicates How to propagate updates Warehouse schema may be a(materialized)view of schema from data sources View maintenance What data to summarize Raw data may be too large to store on-line Aggregate values(totals/subtotals)often suffice Queries on raw data can often be transformed by query optimizer to use aggregate values Database System Concepts-7th Edition 11.11 ©Silberscha乜,Korth and Sudarshan
Database System Concepts - 7 11.11 ©Silberschatz, Korth and Sudarshan th Edition More Warehouse Design Issues ▪ Data transformation and data cleansing • E.g., correct mistakes in addresses (misspellings, zip code errors) • Merge address lists from different sources and purge duplicates ▪ How to propagate updates • Warehouse schema may be a (materialized) view of schema from data sources ▪ View maintenance ▪ What data to summarize • Raw data may be too large to store on-line • Aggregate values (totals/subtotals) often suffice • Queries on raw data can often be transformed by query optimizer to use aggregate values