10 Copyright © 2018 Pearson Educ_中国高校课件下载中心

点击下载：《商务智能：数据分析的管理视角 Business Intelligence, Analytics, and Data Science：A Managerial Perspective》教学资源（习题，原书第4版）chapter 5 Predictive Analytics II：Text, Web, and Social Media Analytics

正在加载图片...

65 )Identify, with a brief description, each of the four steps in the sentiment analysis process Answer: 1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which may be viewed as classification of text as objective or subjective 2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities 3. Target Identification: The goal of this step is to accurately identify the target of the expressed sentiment 4. Collection and Aggregation: In this step all text data points in the document are aggregated and converted to a single sentiment measure for the whole document Diff:2 Page Ref: 282-284 66) In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining? Answer The Web is too big for effective data mining. The Web is so large and growing so rapidly that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making data collection and integration a challenge The Web is too complex. The complexity of a Web page is far greater than a page in a traditional text document collection. Web pages lack a unified structure. They contain far more authoring style and content variation than any set of books, articles, or other traditional text- based document The Web is too dynamic. The Web is a highly dynamic information source. Not only does the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock market results, weather reports, sports scores, prices, company advertisements, and numerous other types of information are updated regularly on the Web The Web is not specific to a domain The Web serves a broad diversity of communities and onnects billions of workstations. Web users have very different backgrounds, interests, and usage purposes. Most users may not have good knowledge of the structure of the information network and may not be aware of the heavy cost of a particular search that they perform The Web has everything. Only a small portion of the information on the Web is truly relevant or useful to someone(or some task). Finding the portion of the Web that is truly relevant to a person and the task being performed is a prominent issue in Web-related research Diff: 2 Page Ref: 287-288 Copyright C 2018 Pearson Education, Inc10 Copyright © 2018 Pearson Education, Inc. 65) Identify, with a brief description, each of the four steps in the sentiment analysis process. Answer: 1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which may be viewed as classification of text as objective or subjective. 2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities. 3. Target Identification: The goal of this step is to accurately identify the target of the expressed sentiment. 4. Collection and Aggregation: In this step all text data points in the document are aggregated and converted to a single sentiment measure for the whole document. Diff: 2 Page Ref: 282-284 66) In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining? Answer: • The Web is too big for effective data mining. The Web is so large and growing so rapidly that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making data collection and integration a challenge. • The Web is too complex. The complexity of a Web page is far greater than a page in a traditional text document collection. Web pages lack a unified structure. They contain far more authoring style and content variation than any set of books, articles, or other traditional textbased document. • The Web is too dynamic. The Web is a highly dynamic information source. Not only does the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock market results, weather reports, sports scores, prices, company advertisements, and numerous other types of information are updated regularly on the Web. • The Web is not specific to a domain. The Web serves a broad diversity of communities and connects billions of workstations. Web users have very different backgrounds, interests, and usage purposes. Most users may not have good knowledge of the structure of the information network and may not be aware of the heavy cost of a particular search that they perform. • The Web has everything. Only a small portion of the information on the Web is truly relevant or useful to someone (or some task). Finding the portion of the Web that is truly relevant to a person and the task being performed is a prominent issue in Web-related research. Diff: 2 Page Ref: 287-288

<<向上翻页向下翻页>>