7 Copyright © 2018Pearson Educat_中国高校课件下载中心

点击下载：《商务智能：数据分析的管理视角 Business Intelligence, Analytics, and Data Science：A Managerial Perspective》教学资源（教师手册，原书第4版）07 Big Data Concepts and Tools

正在加载图片...

What are the use cases for Big Data and Hadoop? In terms of its use cases, Hadoop is differentiated two ways: first, as the repository and refinery of raw data, and second, as an active archive of historical data Hadoop, with their distributed file system and flexibility of data formats (allowing both structured and unstructured data), is ad vantageous when workin with information commonly found on the web, includ ing social med ultimedia, and text. Also, because it can handle such huge volumes of data(and because storage costs are minimized due to the d istributed nature of the file system, historical(archive) data can be managed easily with this approach What are the use cases for data warehousing and RDBMS? Three main use cases for data warehousing are performance, integration, and the availability of a wide variety of BI tools. The relational data warehouse approach is quite mature, and database vend ors are constantly ad d ing new index types, partitioning, statistics, and optimizer features. This enables complex queries to be done quickly, a must for any BI application. Data warehousing, and the etL process, provide a robust mechanism for collecting, cleaning, and integrating data. And, it is increasingly easy for end users to create reports, graphs, and visualizations of the data 4. In what scenarios can hadoop and rdbms coexist? There are several possible scenarios under which using a combination of Hadoop and relational DBMS-based data warehousing technologies makes sense. For example, you can use Hadoop for storing and archiving multi-structured data, with a connector to a relational DBMS that extracts required data from Hadoop for analysis by the relational DBMS. Hadoop can also be used to filter and transform multi-structural data for transporting to a data warehouse, and can also be used to analyze multi-structural data for publishing into the data warehouse environment. Combining SQL and MapReduce query functions enables data scientists to analyze both structured and unstructured data. Also, front end quer tools are available for both platforms Section 7.6 Review Questions 1. What is special about the Big Data vendor landscape? Who are the big players? The Big Data vendor landscape is developing very rapidly. It is in a special period of evolution where entrepreneurial startup firms bring innovative solutions to the marketplace. Cloudera is a market leader in the Hadoop space. MapR and Hortonworks are two other Hadoop startups. Data Stax is an example of a NoSQL vendor Informatica, Pervasive Software, Syncsort, and MicroStrategy are also players. Most of the growth in the industry is with Hadoop and NoSQL distributors and analytics providers. There is still very little in terms of Big Data Copyright C2018 Pearson Education, Inc.7 Copyright © 2018Pearson Education, Inc. 2. What are the use cases for Big Data and Hadoop? In terms of its use cases, Hadoop is differentiated two ways: first, as the repository and refinery of raw data, and second, as an active archive of historical data. Hadoop, with their distributed file system and flexibility of data formats (allowing both structured and unstructured data), is advantageous when working with information commonly found on the Web, including social media, multimedia, and text. Also, because it can handle such huge volumes of data (and because storage costs are minimized due to the distributed nature of the file system), historical (archive) data can be managed easily with this approach. 3. What are the use cases for data warehousing and RDBMS? Three main use cases for data warehousing are performance, integration, and the availability of a wide variety of BI tools. The relational data warehouse approach is quite mature, and database vendors are constantly adding new index types, partitioning, statistics, and optimizer features. This enables complex queries to be done quickly, a must for any BI application. Data warehousing, and the ETL process, provide a robust mechanism for collecting, cleaning, and integrating data. And, it is increasingly easy for end users to create reports, graphs, and visualizations of the data. 4. In what scenarios can Hadoop and RDBMS coexist? There are several possible scenarios under which using a combination of Hadoop and relational DBMS-based data warehousing technologies makes sense. For example, you can use Hadoop for storing and archiving multi-structured data, with a connector to a relational DBMS that extracts required data from Hadoop for analysis by the relational DBMS. Hadoop can also be used to filter and transform multi-structural data for transporting to a data warehouse, and can also be used to analyze multi-structural data for publishing into the data warehouse environment. Combining SQL and MapReduce query functions enables data scientists to analyze both structured and unstructured data. Also, front end query tools are available for both platforms. Section 7.6 Review Questions 1. What is special about the Big Data vendor landscape? Who are the big players? The Big Data vendor landscape is developing very rapidly. It is in a special period of evolution where entrepreneurial startup firms bring innovative solutions to the marketplace. Cloudera is a market leader in the Hadoop space. MapR and Hortonworks are two other Hadoop startups. DataStax is an example of a NoSQL vendor. Informatica, Pervasive Software, Syncsort, and MicroStrategy are also players. Most of the growth in the industry is with Hadoop and NoSQL distributors and analytics providers. There is still very little in terms of Big Data

<<向上翻页向下翻页>>