CHAPTER 3 Descriptive analytics Ii Business Intelligence and Data Warehousing Learning Objectives for Chapter 3 Understand the basic definitions and concepts of data warehousing Understand data warehousing architectures Describe the processes used in developing and managing data warehouses Explain data warehousing operations Explain the role of data warehouses in decision support Explain data integration and the extraction, transformation, and load (etL) processes Understand the essence of business performance management (BPm) Learn balanced scorecard and Six Sigma as performance Copyright C2018 Pearson Education, Inc
1 Copyright © 2018Pearson Education, Inc. Descriptive Analytics II: Business Intelligence and Data Warehousing Learning Objectives for Chapter 3 ▪ Understand the basic definitions and concepts of data warehousing ▪ Understand data warehousing architectures ▪ Describe the processes used in developing and managing data warehouses ▪ Explain data warehousing operations ▪ Explain the role of data warehouses in decision support ▪ Explain data integration and the extraction, transformation, and load (ETL) processes ▪ Understand the essence of business performance management (BPM) ▪ Learn balanced scorecard and Six Sigma as performance measurement systems CHAPTER 3
CHAPTER OVERVIEW The concept of data warehousing has been around since the late 1980s. This chapter provides the foundation for an important type of database, called a data warehouse, which is primarily used for decision support and provides improved analytical capabilities. We discuss data warehousing in the following sections CHAPTER OUTLINE 3. 1 Opening Vignette: Targeting Tax Fraud with Business Intelligence and Data Warehousing 3.2 Business Intelligence and Data Warehousing 3.3 Data Warehousing Process 3.4 Data Warehousing Architectures 3.5 Data Integration and the Extraction, Transformation, and Load (etl) Processes 3.6 Data Warehouse Development 3.7 Data Warehousin ng Implementation Issues 3.8 Data Warehouse Administration, Security Issues, and Future Trends 3.9 Business Performance Management 3.10 Performance Measurement 3.11 Balanced Scorecards 3.12 Six Sigma as a Performance Measurement System ANSWERS TO END OF SECTION REVIEW QUEST|ONs···· Section 3. 1 Review Questions 1. Why is it important for IRS and for U. S state governments to use data warehousing and business intelligence(Bi) tools in managing state revenues? Copyright C2018 Pearson Education, Inc
2 Copyright © 2018Pearson Education, Inc. CHAPTER OVERVIEW The concept of data warehousing has been around since the late 1980s. This chapter provides the foundation for an important type of database, called a data warehouse, which is primarily used for decision support and provides improved analytical capabilities. We discuss data warehousing in the following sections: CHAPTER OUTLINE 3.1 Opening Vignette: Targeting Tax Fraud with Business Intelligence and Data Warehousing 3.2 Business Intelligence and Data Warehousing 3.3 Data Warehousing Process 3.4 Data Warehousing Architectures 3.5 Data Integration and the Extraction, Transformation, and Load (ETL) Processes 3.6 Data Warehouse Development 3.7 Data Warehousing Implementation Issues 3.8 Data Warehouse Administration, Security Issues, and Future Trends 3.9 Business Performance Management 3.10 Performance Measurement 3.11 Balanced Scorecards 3.12 Six Sigma as a Performance Measurement System ANSWERS TO END OF SECTION REVIEW QUESTIONS Section 3.1 Review Questions 1. Why is it important for IRS and for U.S. state governments to use data warehousing and business intelligence (BI) tools in managing state revenues?
Revenues are complex and have many sources. This variety and detail make understand ing the data difficult, hampering efficiency. The use of bi tools allows for better analysis, understand ing, and governance What were the challenges the state of Maryland was facing with regard to tax fraud? The state was facing tax fraud from fraudulent returns as other states were. and the process of detecting and investigating potential fraud was time consuming 3. What was the solution they adopted? Do you agree with their approach? Why? The state implemented a data warehouse from Teradata that allowed them to examine data and identify/flag traits that were consistent with fraudulent return 4. What were the results that they obtained? Did the investment in BI and data warehousing pay off? The team was able to flag a smaller number of potentially fraudulent returns, but those that they did identify were significantly more likely to be false. this allowed the state to recover $7 million more, making the investment pay off. 5. What other problems and challenges do you think federal and state governments are having that can benefit from BI and data warehousing? Student responses will vary but could include ideas relating to voter fraud med ical use. and other tax issues Section 3. 2 Review Questions What is a data warehouse? a data warehouse is defined in this section as"a pool of data produced to support decision making " This focuses on the essentials. leaving out characteristics that may vary from one dw to another but are not essential to the basic concept The same paragraph gives another definition: a subject-oriented, integrated time-variant, nonvolatile collection of data in support of managements decision making process. This definition adds more specifics, but in every case appropriately: it is hard, if not impossible, to conceive of a data warehouse that would not be subject-oriented, integrated, etc How does a data warehouse differ from a database? echnically a data warehouse is a database albeit with certain characteristics to facilitate its role in decision support. Specifically, however, it is(see previous Copyright C2018 Pearson Education, Inc
3 Copyright © 2018Pearson Education, Inc. Revenues are complex and have many sources. This variety and detail make understanding the data difficult, hampering efficiency. The use of BI tools allows for better analysis, understanding, and governance. 2. What were the challenges the state of Maryland was facing with regard to tax fraud? The state was facing tax fraud from fraudulent returns as other states were, and the process of detecting and investigating potential fraud was time consuming. 3. What was the solution they adopted? Do you agree with their approach? Why? The state implemented a data warehouse from Teradata that allowed them to examine data and identify/flag traits that were consistent with fraudulent return. 4. What were the results that they obtained? Did the investment in BI and data warehousing pay off? The team was able to flag a smaller number of potentially fraudulent returns, but those that they did identify were significantly more likely to be false. This allowed the state to recover $7 million more, making the investment pay off. 5. What other problems and challenges do you think federal and state governments are having that can benefit from BI and data warehousing? Student responses will vary but could include ideas relating to voter fraud, medical use, and other tax issues. Section 3.2 Review Questions 1. What is a data warehouse? A data warehouse is defined in this section as “a pool of data produced to support decision making.” This focuses on the essentials, leaving out characteristics that may vary from one DW to another but are not essential to the basic concept. The same paragraph gives another definition: “a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decisionmaking process.” This definition adds more specifics, but in every case appropriately: it is hard, if not impossible, to conceive of a data warehouse that would not be subject-oriented, integrated, etc. 2. How does a data warehouse differ from a database? Technically a data warehouse is a database, albeit with certain characteristics to facilitate its role in decision support. Specifically, however, it is (see previous
question) an"integrated, time-variant, nonvolatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organization. These characteristics, which are discussed further in the section just after the definition, are not necessarily true of databases in general--though each could apply individually to a given one As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are highly denormalized for performance reasons. This is acceptable because their content is never updated, just added to Historical data are static What is an ODS? Operational Data Store is the database from which a business operates on an on going basis Differentiate among a dM. an ODS and an EDw An ODs(Operational Data Store)is the database from which a business operates on an ongoing basis Both an EDW and a data mart(dm)are data warehouses. An EDW(Enterprise Data Warehouse) is an all-encompassing dw that covers all subject areas of interest to the entire organization. a data mart is a smaller dw designed around one problem, organizational function, topic, or other suitable focus area 5. Explain the importance of metadata Metadata, " data about data, are the means through which applications and user access the content of a data warehouse, through which its security is managed and through which organizational management manages, in the true sense of the word,its information assets. Most database management systems would be unable to function without at least some metadata. Indeed the use of metadata, which enable data access through names and logical relationships rather than physical locations, is fundamental to the very concept of a DBms Metadata are essential to any database, not just a data warehouse. ( See answer to Review Question 2 of this section above Section 3.3 Review Questions Describe the data warehousing process The data warehousing process consists of the following steps 1. Data are imported from various internal and external sources Data are cleansed and organized consistently with the organizations needs Copyright C2018 Pearson Education, Inc
4 Copyright © 2018Pearson Education, Inc. question) an “integrated, time-variant, nonvolatile, subject-oriented repository of detail and summary data used for decision support and business analytics within an organization.” These characteristics, which are discussed further in the section just after the definition, are not necessarily true of databases in general—though each could apply individually to a given one. As a practical matter most databases are highly normalized, in part to avoid update anomalies. Data warehouses are highly denormalized for performance reasons. This is acceptable because their content is never updated, just added to. Historical data are static. 3. What is an ODS? Operational Data Store is the database from which a business operates on an ongoing basis. 4. Differentiate among a DM, an ODS, and an EDW. An ODS (Operational Data Store) is the database from which a business operates on an ongoing basis. Both an EDW and a data mart (DM) are data warehouses. An EDW (Enterprise Data Warehouse) is an all-encompassing DW that covers all subject areas of interest to the entire organization. A data mart is a smaller DW designed around one problem, organizational function, topic, or other suitable focus area. 5. Explain the importance of metadata. Metadata, “data about data,” are the means through which applications and users access the content of a data warehouse, through which its security is managed, and through which organizational management manages, in the true sense of the word, its information assets. Most database management systems would be unable to function without at least some metadata. Indeed, the use of metadata, which enable data access through names and logical relationships rather than physical locations, is fundamental to the very concept of a DBMS. Metadata are essential to any database, not just a data warehouse. (See answer to Review Question 2 of this section above.) Section 3.3 Review Questions 1. Describe the data warehousing process. The data warehousing process consists of the following steps: 1. Data are imported from various internal and external sources 2. Data are cleansed and organized consistently with the organization’s needs
Data are loaded into the enterprise data warehouse, or b. Data are loaded into data marts If desired. data marts are created as subsets of the edw. or b The data marts are consolidated into the edw Analyses are performed as needed 2. Describe the major components of a data warehouse Data sources. Data are sourced from operational systems and possibly from external data sources Data extraction and transformation. Data are extracted and properl transformed using custom-written or commercial software called ETL Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse Comprehensive database. This is the edw that supports decision analysis by provid ing relevant summarized and detailed information Metadata. Metad ata are maintained for access by It personnel and users Metadata include rules for organizing data summaries that are easy to index and search Middleware tools. Midd leware tools enable access to the data warehouse from a variety of front-end applications 3. Identify and discuss the role of middleware tools Mid d leware tools enable access to the data warehouse. power users such analysts may write their own SQL queries. Others may access data through managed query environment. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools. all these have their own data access requirements. Those may not match with how a given data warehouse must be accessed. Midd leware translates between the two Section 3. 4 Review Questions What are the key similarities and differences between a two-tiered architecture and a three-tiered architecture? Both provide the same user visibil ity through a client system that accesses a DSS/BI application remotely. The difference is behind the scenes and is invisible to the user: in a two-tiered architecture, the application and data warehouse reside on the same machine; in a three-tiered architecture, they are on separate machines How has the Web influenced data warehouse design? Copyright C2018 Pearson Education, Inc
5 Copyright © 2018Pearson Education, Inc. 3. a. Data are loaded into the enterprise data warehouse, or b. Data are loaded into data marts. 4. a. If desired, data marts are created as subsets of the EDW, or b. The data marts are consolidated into the EDW 5. Analyses are performed as needed 2. Describe the major components of a data warehouse. • Data sources. Data are sourced from operational systems and possibly from external data sources. • Data extraction and transformation. Data are extracted and properly transformed using custom-written or commercial software called ETL. • Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse. • Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information. • Metadata. Metadata are maintained for access by IT personnel and users. Metadata include rules for organizing data summaries that are easy to index and search. • Middleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications. 3. Identify and discuss the role of middleware tools. Middleware tools enable access to the data warehouse. Power users such as analysts may write their own SQL queries. Others may access data through a managed query environment. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools. All these have their own data access requirements. Those may not match with how a given data warehouse must be accessed. Middleware translates between the two. Section 3.4 Review Questions 1. What are the key similarities and differences between a two-tiered architecture and a three-tiered architecture? Both provide the same user visibility through a client system that accesses a DSS/BI application remotely. The difference is behind the scenes and is invisible to the user: in a two-tiered architecture, the application and data warehouse reside on the same machine; in a three-tiered architecture, they are on separate machines. 2. How has the Web influenced data warehouse design?
Primarily by making Web-based dataw ng possible 3. List the alternative data warehousing architectures discussed in this section Independent data marts architecture Data mart bus architecture with linked d imensional data marts Hub-and-spoke architecture(corporate information factor Centralized data warehouse architecture Federated architecture 4. What issues should be considered when deciding which architecture to use in data warehouse? List the 10 most Information interdependence between organizational units Upper managements information needs 3. Urgency of need for a data warehouse Nature of end-user tasks Constraints on resources 6. Strategic view of the data warehouse prior to implementation 7. Compatibility with existing systems 8. Perceived ability of the in-house IT staff 9. Technical issues 10. Social/political factors (This list from the text, while clearly intended by the authors as the answer to this review question, does not explicitly say that these are the ten most important factors. Students may choose others. 5 Which data warehousing architecture is the best? Why? See table 3. 1 Average assessment Scores for the Success of the Architectures What is interesting is the similarity of the averages for the bus, hub-and-spoke, and centralized architectures. The differences are sufficiently small that no claims can be made for a particular architectures superiority over the others, at least based on a simple comparison of these success measures Section 3.5 Review Questions Data integration is an umbrella term that covers three processes that combine to move data from multiple data warehouse: accessing the data, combining different views of the data, and capturing changes to the data 2. Describe the three steps of the etl process 6 Copyright C2018 Pearson Education, Inc
6 Copyright © 2018Pearson Education, Inc. Primarily by making Web-based data warehousing possible. 3. List the alternative data warehousing architectures discussed in this section. • Independent data marts architecture • Data mart bus architecture with linked dimensional data marts • Hub-and-spoke architecture (corporate information factory) • Centralized data warehouse architecture • Federated architecture 4. What issues should be considered when deciding which architecture to use in developing a data warehouse? List the 10 most important factors. 1. Information interdependence between organizational units 2. Upper management’s information needs 3. Urgency of need for a data warehouse 4. Nature of end-user tasks 5. Constraints on resources 6. Strategic view of the data warehouse prior to implementation 7. Compatibility with existing systems 8. Perceived ability of the in-house IT staff 9. Technical issues 10. Social/political factors (This list from the text, while clearly intended by the authors as the answer to this review question, does not explicitly say that these are the ten most important factors. Students may choose others.) 5. Which data warehousing architecture is the best? Why? See Table 3.1 Average Assessment Scores for the Success of the Architectures. What is interesting is the similarity of the averages for the bus, hub-and-spoke, and centralized architectures. The differences are sufficiently small that no claims can be made for a particular architecture’s superiority over the others, at least based on a simple comparison of these success measures. Section 3.5 Review Questions 1. Describe data integration. Data integration is an umbrella term that covers three processes that combine to move data from multiple sources into a data warehouse: accessing the data, combining different views of the data, and capturing changes to the data. 2. Describe the three steps of the ETL process
Extraction: selecting data from one or more sources and read ing the selected data Transformation: converting data from their original form to whatever form the DW needs. This step often also includes cleansing of the data to remove as many errors as possible Load: putting the converted(transformed) data into the Dw 3. Why is the etl process so important for data warehousing efforts? Since etl is the process through which data are loaded into a data warehouse, a DW could not exist without it. The etl process also contributes to the quality of the data in a dw Section 3.6 Review Questions 1. List the benefits of data warehouses Direct benefits include Allowing end users to perform extensive analysis in numerous ways a consolidated view of corporate data(i.e, a single version of the truth) Better and more timely information. a data warehouse permits information processing to be offloaded from costly operational systems onto low-cost servers; therefore, end-user information requests can be cessed more qui Enhanced system performance. a data warehouse frees production processing because some operational system reporting requirements are moved to dss Simplification of data access Indirect benefits arise when end users take ad vantage of these direct benefits List several criteria for selecting a data warehouse vendor, and describe why they Six important criteria are: financial strength, ERP linkages, qualified consultants, market share, industry experience, and established partnerships. These are important to indicate that a vendor is likely to be in business for the long term, to have the support capabilities its customers need, and to provide products that interoperate with other products the potential user has or may obtain One could add others, such as product functionality(Does it do what we need?) vendor strategic vision(Does their direction make sense for our future plans Copyright C2018 Pearson Education, Inc
7 Copyright © 2018Pearson Education, Inc. Extraction: selecting data from one or more sources and reading the selected data. Transformation: converting data from their original form to whatever form the DW needs. This step often also includes cleansing of the data to remove as many errors as possible. Load: putting the converted (transformed) data into the DW. 3. Why is the ETL process so important for data warehousing efforts? Since ETL is the process through which data are loaded into a data warehouse, a DW could not exist without it. The ETL process also contributes to the quality of the data in a DW. Section 3.6 Review Questions 1. List the benefits of data warehouses. Direct benefits include: • Allowing end users to perform extensive analysis in numerous ways. • A consolidated view of corporate data (i.e., a single version of the truth). • Better and more timely information. A data warehouse permits information processing to be offloaded from costly operational systems onto low-cost servers; therefore, end-user information requests can be processed more quickly. • Enhanced system performance. A data warehouse frees production processing because some operational system reporting requirements are moved to DSS. • Simplification of data access. Indirect benefits arise when end users take advantage of these direct benefits. 2. List several criteria for selecting a data warehouse vendor, and describe why they are important. Six important criteria are: financial strength, ERP linkages, qualified consultants, market share, industry experience, and established partnerships. These are important to indicate that a vendor is likely to be in business for the long term, to have the support capabilities its customers need, and to provide products that interoperate with other products the potential user has or may obtain. One could add others, such as product functionality (Does it do what we need?), vendor strategic vision (Does their direction make sense for our future plans
and/or is it consistent with industry trends? )and quality of customer references (What do their existing customers think of them? 3. What is olap and how does it differ from oltp? Data stored in a data warehouse can be analyzed using techniques referred to as OLAP OLAP is one of the most commonly used data analysis techniques in data warehouses. OLAP is an approach to quickly answer ad hoc questions that require data analysis OLTP is concerned with the capture and storage of data. OLAP is concerned with the analysis of that data 4. What is a cube? What do drill down, roll up and slice and dice mean? The main operational structure in OlaP is based on a concept called cube cube in olap is a multid imensional data structure(actual or virtual) that allows fast analysis of data Using OlaP, an analyst can navigate through the database and screen for a orientations and defining analytical calculations. These types of user-immtatley g particular subset of the data(and its progression over time) by changing the data's navigation of data through the specification of slices(via rotations )and drill down/up(via aggregation and disaggregation) are sometimes called"slice and dice. Commonly used OLAP operations include slice and dice, drill down, roll up, and pivot Slice: A slice is a subset of a multid imensional array(usually a two- dimensional representation)correspond ing to a single value set for one(or more)of the d imensions not in the subset Dice: The dice operation is a slice on more than two dimensions of a dat Drill Down/Up: Drilling down or up is a specific OLAP technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed(down) Section 3. 7 Review Questions 1. What are the major Dw implementation tasks that can be performed in parallel? Reeves(2009)and Solomon(2005) provided some guidelines regard ing the critical questions that must be asked, some risks that should be weighted, and some processes that can be followed to help ensure a successful data warehouse Copyright C2018 Pearson Education, Inc
8 Copyright © 2018Pearson Education, Inc. and/or is it consistent with industry trends?) and quality of customer references (What do their existing customers think of them?). 3. What is OLAP and how does it differ from OLTP? Data stored in a data warehouse can be analyzed using techniques referred to as OLAP. OLAP is one of the most commonly used data analysis techniques in data warehouses. OLAP is an approach to quickly answer ad hoc questions that require data analysis. OLTP is concerned with the capture and storage of data. OLAP is concerned with the analysis of that data. 4. What is a cube? What do drill down, roll up, and slice and dice mean? The main operational structure in OLAP is based on a concept called cube. A cube in OLAP is a multidimensional data structure (actual or virtual) that allows fast analysis of data. Using OLAP, an analyst can navigate through the database and screen for a particular subset of the data (and its progression over time) by changing the data’s orientations and defining analytical calculations. These types of user-initiated navigation of data through the specification of slices (via rotations) and drill down/up (via aggregation and disaggregation) are sometimes called “slice and dice.” Commonly used OLAP operations include slice and dice, drill down, roll up, and pivot. • Slice: A slice is a subset of a multidimensional array (usually a twodimensional representation) corresponding to a single value set for one (or more) of the dimensions not in the subset. • Dice: The dice operation is a slice on more than two dimensions of a data cube. • Drill Down/Up: Drilling down or up is a specific OLAP technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down). Section 3.7 Review Questions 1. What are the major DW implementation tasks that can be performed in parallel? Reeves (2009) and Solomon (2005) provided some guidelines regarding the critical questions that must be asked, some risks that should be weighted, and some processes that can be followed to help ensure a successful data warehouse
nplementation. They compiled a list of 1 l major tasks that could be performed in paralle Establishment of service-level agreements and data-refresh requirements Identification of data sources and their governance policies Data quality planning Data model design ETL tool selection Relational database software and platform selection Data transport Data conversion Reconciliation process Purge and archive planning End-user support 2. List and discuss the most pronounced dw implementation guidelines Senior management must support development of the data warehouse. The Dw needs a project champion at a high position in the organization chart Benefits of a Dw project may be difficult to measure, so management support makes it more likely the project will receive funding Web-based data warehouses may need special security requirements These ensure that only authorized users have access to the data Users should participate in the development process. Their participation essential for data modeling and access modeling. User participation ensures that the dw includes the needed data and that decision makers can retrieve the data they need DW implementation requires certain skills from members of the development team: in-depth knowledge of database technology and the development tools used 3. When developing a successful data warehouse, what are the most important risks and issues to consider and potentially avoid? Starting with the wrong sponsorship chain Setting expectations that you cannot meet Copyright C2018 Pearson Education, Inc
9 Copyright © 2018Pearson Education, Inc. implementation. They compiled a list of 11 major tasks that could be performed in parallel: Establishment of service-level agreements and data-refresh requirements Identification of data sources and their governance policies Data quality planning Data model design ETL tool selection Relational database software and platform selection Data transport Data conversion Reconciliation process Purge and archive planning End-user support 2. List and discuss the most pronounced DW implementation guidelines. • Senior management must support development of the data warehouse. The DW needs a project champion at a high position in the organization chart. Benefits of a DW project may be difficult to measure, so management support makes it more likely the project will receive funding. • Web-based data warehouses may need special security requirements. These ensure that only authorized users have access to the data. • Users should participate in the development process. Their participation is essential for data modeling and access modeling. User participation ensures that the DW includes the needed data and that decision makers can retrieve the data they need. • DW implementation requires certain skills from members of the development team: in-depth knowledge of database technology and the development tools used. 3. When developing a successful data warehouse, what are the most important risks and issues to consider and potentially avoid? • Starting with the wrong sponsorship chain • Setting expectations that you cannot meet
Engaging in politically naive behavior Loading the data warehouse with information just because it is available Believing that data warehousing database design is the same as transactional database design Choosing a data warehouse manager who is technology oriented rather than user oriented Focusing on trad itional internal record-oriented data and ignoring the value of external data and of text, images, and, perhaps, sound and video Delivering data with overlapping and confusing definitions Believing promises of performance, capacity, and scalability Believing that your problems are over when the data warehouse is up and running Focusing on ad hoc data mining and period ic reporting instead of alerts 4. What is scalabil ity? How does it apply to Dw? Scalability refers to the degree to which a system can adjust to changes in demand without major additional changes or investments. Dw scalability issues are the amount of data in the warehouse, how quickly the warehouse is expected to grow, the number of concurrent users, and the complexity of user queries. a data warehouse must scale both horizontally and vertically. The warehouse will grow as a function of data growth and the need to expand the warehouse to support new business functionality. Data growth may be a result of the addition of current cycle data(e. g, this month's results) and/or historical data Section 3. 8 Review Questions What steps can an organization take to ensure the security and confidentiality of customer data in its data warehouse? Effective security in a data warehouse should focus on four main areas Step 1. Establishing effective corporate and security policies and procedures. An effective security policy should start at the top and be communicated to everyone in the organization Step 2. Implementing logical security procedures and techniques to restrict access This includes user authentication, access controls, and encryption Step 3. Limiting physical access to the data center environment Copyright C2018 Pearson Education, Inc
10 Copyright © 2018Pearson Education, Inc. • Engaging in politically naive behavior • Loading the data warehouse with information just because it is available • Believing that data warehousing database design is the same as transactional database design • Choosing a data warehouse manager who is technology oriented rather than user oriented • Focusing on traditional internal record-oriented data and ignoring the value of external data and of text, images, and, perhaps, sound and video • Delivering data with overlapping and confusing definitions • Believing promises of performance, capacity, and scalability • Believing that your problems are over when the data warehouse is up and running • Focusing on ad hoc data mining and periodic reporting instead of alerts 4. What is scalability? How does it apply to DW? Scalability refers to the degree to which a system can adjust to changes in demand without major additional changes or investments. DW scalability issues are the amount of data in the warehouse, how quickly the warehouse is expected to grow, the number of concurrent users, and the complexity of user queries. A data warehouse must scale both horizontally and vertically. The warehouse will grow as a function of data growth and the need to expand the warehouse to support new business functionality. Data growth may be a result of the addition of current cycle data (e.g., this month’s results) and/or historical data. Section 3.8 Review Questions 1. What steps can an organization take to ensure the security and confidentiality of customer data in its data warehouse? Effective security in a data warehouse should focus on four main areas: Step 1. Establishing effective corporate and security policies and procedures. An effective security policy should start at the top and be communicated to everyone in the organization. Step 2. Implementing logical security procedures and techniques to restrict access. This includes user authentication, access controls, and encryption. Step 3. Limiting physical access to the data center environment