Chapter 18:Data Analysis and Mining Decision Support Systems Data Analysis and OLAP Data Warehousing Data Mining Database System Concepts-5th Edition,Aug 26,2005 18.2 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 18.2 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Chapter 18: Data Analysis and Mining Decision Support Systems Data Analysis and OLAP Data Warehousing Data Mining
Decision Support Systems Decision-support systems are used to make business decisions,often based on data collected by on-line transaction-processing systems. Examples of business decisions: What items to stock? What insurance premium to change? To whom to send advertisements? Examples of data used for making decisions Retail sales transaction details Customer profiles(income,age,gender,etc.) Database System Concepts-5th Edition,Aug 26,2005 18.3 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 18.3 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Decision Support Systems Decision-support systems are used to make business decisions, often based on data collected by on-line transaction-processing systems. Examples of business decisions: What items to stock? What insurance premium to change? To whom to send advertisements? Examples of data used for making decisions Retail sales transaction details Customer profiles (income, age, gender, etc.)
Decision-Support Systems:Overview Data analysis tasks are simplified by specialized tools and SQL extensions Example tasks For each product category and each region,what were the total sales in the last quarter and how do they compare with the same quarter last year As above,for each product category and each customer category Statistical analysis packages (e.g.,S++)can be interfaced with databases Statistical analysis is a large field,but not covered here Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases. A data warehouse archives information gathered from multiple sources, and stores it under a unified schema,at a single site. Important for large businesses that generate data from multiple divisions,possibly at multiple sites Data may also be purchased externally Database System Concepts-5th Edition,Aug 26,2005 18.4 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 18.4 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Decision-Support Systems: Overview Data analysis tasks are simplified by specialized tools and SQL extensions Example tasks For each product category and each region, what were the total sales in the last quarter and how do they compare with the same quarter last year As above, for each product category and each customer category Statistical analysis packages (e.g., : S++) can be interfaced with databases Statistical analysis is a large field, but not covered here Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from large databases. A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site. Important for large businesses that generate data from multiple divisions, possibly at multiple sites Data may also be purchased externally
Data Analysis and OLAP Online Analytical Processing (OLAP) Interactive analysis of data,allowing data to be summarized and viewed in different ways in an online fashion(with negligible delay) Data that can be modeled as dimension attributes and measure attributes are called multidimensional data. Measure attributes measure some value can be aggregated upon e.g.the attribute number of the sales relation Dimension attributes define the dimensions on which measure attributes (or aggregates thereof)are viewed e.g.the attributes item name,color,and size of the sales relation Database System Concepts-5th Edition,Aug 26,2005 18.5 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 18.5 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Data Analysis and OLAP Online Analytical Processing (OLAP) Interactive analysis of data, allowing data to be summarized and viewed in different ways in an online fashion (with negligible delay) Data that can be modeled as dimension attributes and measure attributes are called multidimensional data. Measure attributes measure some value can be aggregated upon e.g. the attribute number of the sales relation Dimension attributes define the dimensions on which measure attributes (or aggregates thereof) are viewed e.g. the attributes item_name, color, and size of the sales relation
Cross Tabulation of sales by item-name and color size: all color dark pastel white Total skirt 8 35 10 53 dress 20 10 5 35 item-name shirt 14 7 28 49 pant 20 2 5 27 Total 62 54 48 164 The table above is an example of a cross-tabulation(cross-tab),also referred to as a pivot-table. Values for one of the dimension attributes form the row headers Values for another dimension attribute form the column headers Other dimension attributes are listed on top Values in individual cells are (aggregates of)the values of the dimension attributes that specify the cell. Database System Concepts-5th Edition,Aug 26,2005 18.6 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 18.6 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Cross Tabulation of sales by item-name and color The table above is an example of a cross-tabulation (cross-tab), also referred to as a pivot-table. Values for one of the dimension attributes form the row headers Values for another dimension attribute form the column headers Other dimension attributes are listed on top Values in individual cells are (aggregates of) the values of the dimension attributes that specify the cell
Relational Representation of Cross-tabs item-name color number Cross-tabs can be represented skirt dark 8 as relations skirt pastel 35 skirt white 10 We use the value all is used to skirt all 53 represent aggregates dress dark 20 The SQL:1999 standard dress pastel 10 actually uses null values in dress white 5 place of all despite confusion dress all 35 with regular null values shirt dark 14 shirt pastel 7 shirt white 28 shirt all 49 pant dark 20 pant pastel pant white 5 pant all 27 a dark 62 all pastel 54 a white 48 all 164 Database System Concepts-5th Edition,Aug 26,2005 18.7 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 18.7 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Relational Representation of Cross-tabs Cross-tabs can be represented as relations We use the value all is used to represent aggregates The SQL:1999 standard actually uses null values in place of all despite confusion with regular null values
Data Cube A data cube is a multidimensional generalization of a cross-tab Can have n dimensions;we show 3 below Cross-tabs can be used as views on a data cube 2 11 6 12 /29 2 /8 /5/7 /22 16 4 dark 8 20 14 20 62 34 18 pastel 35 10 7 2 54 9 21 0j05 45 white 10 8 28 5 48 42 small 77 medium all 53 35 49 27 164 large all size skirt dress shirts pant all item name Database System Concepts-5th Edition,Aug 26,2005 18.8 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 18.8 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Data Cube A data cube is a multidimensional generalization of a cross-tab Can have n dimensions; we show 3 below Cross-tabs can be used as views on a data cube
Online Analytical Processing Pivoting:changing the dimensions used in a cross-tab is called Slicing:creating a cross-tab for fixed values only Sometimes called dicing,particularly when values for multiple dimensions are fixed. Rollup:moving from finer-granularity data to a coarser granularity Drill down:The opposite operation-that of moving from coarser- granularity data to finer-granularity data Database System Concepts-5th Edition,Aug 26,2005 18.9 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 18.9 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Online Analytical Processing Pivoting: changing the dimensions used in a cross-tab is called Slicing: creating a cross-tab for fixed values only Sometimes called dicing, particularly when values for multiple dimensions are fixed. Rollup: moving from finer-granularity data to a coarser granularity Drill down: The opposite operation - that of moving from coarsergranularity data to finer-granularity data
Hierarchies on Dimensions Hierarchy on dimension attributes:lets dimensions to be viewed at different levels of detail E.g.the dimension DateTime can be used to aggregate by hour of day,date,day of week,month,quarter or year Year Quarter Region Day of week Month Country Hour of day Date State Date Time City a)Time Hierarchy b)Location Hierarchy Database System Concepts-5th Edition,Aug 26,2005 18.10 ©Silberschat乜,Korth and Sudarshan
Database System Concepts - 5 18.10 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Hierarchies on Dimensions Hierarchy on dimension attributes: lets dimensions to be viewed at different levels of detail E.g. the dimension DateTime can be used to aggregate by hour of day, date, day of week, month, quarter or year
Cross Tabulation With Hierarchy Cross-tabs can be easily extended to deal with hierarchies Can drill down or roll up on a hierarchy category item-name dark pastel white total womenswear skirt 8 8 10 53 dress 20 20 5 35 subtotal 28 28 15 88 menswear pants 14 14 28 49 shirt 20 20 5 27 subtotal 34 34 33 76 total 62 62 48 164 Database System Concepts-5th Edition,Aug 26,2005 18.11 @Silberschatz,Korth and Sudarshan
Database System Concepts - 5 18.11 ©Silberschatz, Korth and Sudarshan th Edition, Aug 26, 2005 Cross Tabulation With Hierarchy Cross-tabs can be easily extended to deal with hierarchies Can drill down or roll up on a hierarchy