《商务智能：数据分析的管理视角 Business Intelligence, Analytics, and Data Science：A Managerial Perspective》教学资源（教师手册，原书第4版）05 Predictive Analytics II：Text, Web, and Social Media Analytics

团购合买资源类别：文库，文档格式：DOC，文档页数：25，文件大小：130KB

3. What are the innovative characteristics of Deep a architecture that made watson superior The DeepQA architecture involves massive parallelism, many experts, pervasive confidence estimation, and integration of the-latest-and-greatest in-text analytics, involving both shallow and deep semantic knowledge. As implemented in Watson, DeepQa brings more than 100 different techniques for analyzing natural language identifying sources, finding anking ypotheses. More important than any nd generating hypotheses, find ing and scoring particular technique is the combination of overlapping approaches that can bring their strengths to bear and contribute to improvements in accuracy, confidence, and Why did I BM spend all that time and money to build Watson? Where is the rol? IBMs goal was to advance computer science by exploring new ways for computer technology to affect science, business, and society. The techniques IBM developed with DeepQA and Watson are relevant in a wide variety of domains central to IBMs mission. For example, IBM is currently working on a version of Watson to take on surmountable problems in healthcare and medicine. If successful, this could give IBM a distinct competitive advantage in this important technological application ar Section 5.2 Review Questions What is text analytics? How does it differ from text mining? Text analytics is a concept that includes information retrieval (e. g, searching and identifying relevant documents for a given set of key terms)as well as information extraction, data mining, and Web mining. By contrast, text mining is primarily focused on discovering new and useful knowledge from textual data sources. The overarching goal for both text analytics and text mining is to turn unstructured textual data into actionable information through the application of natural language processing(NLP)and analytics. However, text analytics is a broader term because of its inclusion of information retrieval. you can think of text analytics as a combination of information retrieval plus text mining 2. What is text mining? How does it differ from data mining? Text mining is the application of data mining to unstructured, or less structured text files. As the names indicate, text mining analyzes words, and data mini alyzes numeric data Copyright C2018 Pearson Education, Inc

3 Copyright © 2018Pearson Education, Inc. 3. What are the innovative characteristics of DeepQA architecture that made Watson superior? The DeepQA architecture involves massive parallelism, many experts, pervasive confidence estimation, and integration of the-latest-and-greatest in-text analytics, involving both shallow and deep semantic knowledge. As implemented in Watson, DeepQA brings more than 100 different techniques for analyzing natural language, identifying sources, finding and generating hypotheses, finding and scoring evidence, and merging and ranking hypotheses. More important than any particular technique is the combination of overlapping approaches that can bring their strengths to bear and contribute to improvements in accuracy, confidence, and speed. 4. Why did IBM spend all that time and money to build Watson? Where is the ROI? IBM’s goal was to advance computer science by exploring new ways for computer technology to affect science, business, and society. The techniques IBM developed with DeepQA and Watson are relevant in a wide variety of domains central to IBM’s mission. For example, IBM is currently working on a version of Watson to take on surmountable problems in healthcare and medicine. If successful, this could give IBM a distinct competitive advantage in this important technological application area. Section 5.2 Review Questions 1. What is text analytics? How does it differ from text mining? Text analytics is a concept that includes information retrieval (e.g., searching and identifying relevant documents for a given set of key terms) as well as information extraction, data mining, and Web mining. By contrast, text mining is primarily focused on discovering new and useful knowledge from textual data sources. The overarching goal for both text analytics and text mining is to turn unstructured textual data into actionable information through the application of natural language processing (NLP) and analytics. However, text analytics is a broader term because of its inclusion of information retrieval. You can think of text analytics as a combination of information retrieval plus text mining. 2. What is text mining? How does it differ from data mining? Text mining is the application of data mining to unstructured, or less structured, text files. As the names indicate, text mining analyzes words; and data mining analyzes numeric data

4 Copyright © 2018Pearson Education, Inc. 3. Why is the popularity of text mining as a BI tool increasing? Text mining as a BI tool is increasing because of the rapid growth in text data and availability of sophisticated BI tools. The benefits of text mining are obvious in the areas where very large amounts of textual data are being generated, such as law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology (molecular interactions), technology (patent files), and marketing (customer comments). 4. What are some popular application areas of text mining? • Information extraction. Identification of key phrases and relationships within text by looking for predefined sequences in text via pattern matching. • Topic tracking. Based on a user profile and documents that a user views, text mining can predict other documents of interest to the user. • Summarization. Summarizing a document to save time on the part of the reader. • Categorization. Identifying the main themes of a document and then placing the document into a predefined set of categories based on those themes. • Clustering. Grouping similar documents without having a predefined set of categories. • Concept linking. Connects related documents by identifying their shared concepts and, by doing so, helps users find information that they perhaps would not have found using traditional search methods. • Question answering. Finding the best answer to a given question through knowledge-driven pattern matching. Section 5.3 Review Questions 1. What is NLP? Natural language processing (NLP) is an important component of text mining and is a subfield of artificial intelligence and computational linguistics. It studies the problem of “understanding” the natural human language, with the view of converting depictions of human language (such as textual documents) into more formal representations (in the form of numeric and symbolic data) that are easier for computer programs to manipulate

topics, sentiment classification generally deals with two classes(positive versus negative), a range of polarity(e.g, star ratings for movies), or a range in strength of opinion What are the most popular application areas for sentiment analysis? Why? Customer relationship management(CRM) and customer experience management are popular"voice of the customer(VOC)applications. Other application areas include"voice of the market(VOM)and"voice of the employee (VOe) What would be the expected benefits and beneficiaries of sentiment analysis in olitics? Opinions matter a great deal in politics. Because political discussions are dominated by quotes, sarcasm, and complex references to persons, organizations, and ideas, politics is one of the most difficult, and potentially fruitful, areas for sentiment analysis. By analyzing the sentiment on election forums, one may predict who is more likely to win or lose. Sentiment analysis can help understand what voters are thinking and can clarify a cand idate's position on issues Sentiment analysis can help political organizations, campaigns, and news analysts to better understand which issues and positions matter the most to voters. The technology was successfully applied by both parties to the 2008 and 2012 American presidential election campaigns 4. What are the main steps in carrying out sentiment analysis projects? The first step when performing sentiment analysis of a text document is called sentiment detection, during which text data is differentiated between fact and opinion(objective vs subjective). This is followed by negative-positive(N-P) polarity classification, where a subjective text item is classified on a bipolar range Following this comes target identification(identifying the person, product, event, etc. that the sentiment is about ) Finally come collection and aggregation, in which the overall sentiment for the document is calculated based on the calculations of sentiments of individual phrases and words from the first three 5. What are the two common methods for polarity identification? Explain Polarity identification can be done via a lexicon(as a reference library )or by using a collection of training documents and inductive machine learning algorithms. The lexicon approach uses a catalog of words, their synonyms, and their meanings, combined with numerical ratings indicating the position on the n P polarity associated with these words. In this way, affective, emotional, and attitud inal phrases can be classified according to their degree of positivity or negativity. By contrast, the training-document approach uses statistical analysis and machine learning algorithms, such as neural networks, clustering approaches Copyright C2018 Pearson Education, Inc

8 Copyright © 2018Pearson Education, Inc. topics, sentiment classification generally deals with two classes (positive versus negative), a range of polarity (e.g., star ratings for movies), or a range in strength of opinion. 2. What are the most popular application areas for sentiment analysis? Why? Customer relationship management (CRM) and customer experience management are popular “voice of the customer (VOC)” applications. Other application areas include “voice of the market (VOM)” and “voice of the employee (VOE).” 3. What would be the expected benefits and beneficiaries of sentiment analysis in politics? Opinions matter a great deal in politics. Because political discussions are dominated by quotes, sarcasm, and complex references to persons, organizations, and ideas, politics is one of the most difficult, and potentially fruitful, areas for sentiment analysis. By analyzing the sentiment on election forums, one may predict who is more likely to win or lose. Sentiment analysis can help understand what voters are thinking and can clarify a candidate’s position on issues. Sentiment analysis can help political organizations, campaigns, and news analysts to better understand which issues and positions matter the most to voters. The technology was successfully applied by both parties to the 2008 and 2012 American presidential election campaigns. 4. What are the main steps in carrying out sentiment analysis projects? The first step when performing sentiment analysis of a text document is called sentiment detection, during which text data is differentiated between fact and opinion (objective vs. subjective). This is followed by negative-positive (N-P) polarity classification, where a subjective text item is classified on a bipolar range. Following this comes target identification (identifying the person, product, event, etc. that the sentiment is about). Finally come collection and aggregation, in which the overall sentiment for the document is calculated based on the calculations of sentiments of individual phrases and words from the first three steps. 5. What are the two common methods for polarity identification? Explain. Polarity identification can be done via a lexicon (as a reference library) or by using a collection of training documents and inductive machine learning algorithms. The lexicon approach uses a catalog of words, their synonyms, and their meanings, combined with numerical ratings indicating the position on the NP polarity associated with these words. In this way, affective, emotional, and attitudinal phrases can be classified according to their degree of positivity or negativity. By contrast, the training-document approach uses statistical analysis and machine learning algorithms, such as neural networks, clustering approaches

Section 5.8 Review Questions 1. What is a search engine? Why are they important for today's businesses? A search engine is a software program that searches for documents(Internet sites or files) based on the key words(individual words, multi-word terms, or a complete sentence)that users have provided that have to do with the subject of their inquiry. This is the most prominent type of information retrieval system fo finding relevant content on the Web Search engines have become the centerpiece of most Internet-based transactions and other activities. Because people use them extensively to learn about products and services, it is very important for companies to have prominent visibility on the Web; hence the major effort of companies to enhance their search engine optimization(SEO) 2. what is a web crawler? what is it used for? How does it work? A Web crawler(also called a spider or a Web spider )is a piece of software that systematically browses(crawls through) the World Wide Web for the purpose of find ing and fetching Web pages. It starts with a list of"seed"URLS, goes to the pages of those URLS, and then follows each page's hyperlinks, add ing them to the search engine's database. Thus, the Web crawler navigates through the Web in order to construct the database of websites 3. What is"search engine optimization"? Who benefits from it? Search engine optimization(SEO) is the intentional activity of affecting the visibility of an e-commerce site or a website in a search engines natural (unpaid or organic)search results. It involves editing a pages content, HTML, metadata and associated coding to both increase its relevance to specific key words and to remove barriers to the indexing activities of search engines. In addition, SEO efforts include promoting a site to increase its number of inbound linkS. SEO primarily benefits companies with e-commerce sites by making their pages appear toward the top of search engine lists when users query 4. What things can help Web pages rank higher in the search engine results? Cross-linking between pages of the same website to provide more links to the most important pages may improve its visibility. Writing content that includes frequently searched keyword phrases, so as to be relevant to a wide variety of search queries, will tend to increase traffic. Updating content so as to keep search engines crawling back frequently can give add itional weight to a site. Adding relevant keywords to a Web page's metadata, includ ing the title tag and metadescription, will tend to improve the relevancy of a site's search listings, thus increasing traffic. URL normalization of Web pages so that they are accessible via multiple URLS and using canonical link elements and redirects can help make sure links to different versions of the URL all count toward the page's link opularity scol Copyright C2018 Pearson Education, Inc

10 Copyright © 2018Pearson Education, Inc. Section 5.8 Review Questions 1. What is a search engine? Why are they important for today’s businesses? A search engine is a software program that searches for documents (Internet sites or files) based on the keywords (individual words, multi-word terms, or a complete sentence) that users have provided that have to do with the subject of their inquiry. This is the most prominent type of information retrieval system for finding relevant content on the Web. Search engines have become the centerpiece of most Internet-based transactions and other activities. Because people use them extensively to learn about products and services, it is very important for companies to have prominent visibility on the Web; hence the major effort of companies to enhance their search engine optimization (SEO). 2. What is a web crawler? What is it used for? How does it work? A Web crawler (also called a spider or a Web spider) is a piece of software that systematically browses (crawls through) the World Wide Web for the purpose of finding and fetching Web pages. It starts with a list of “seed” URLs, goes to the pages of those URLs, and then follows each page’s hyperlinks, adding them to the search engine’s database. Thus, the Web crawler navigates through the Web in order to construct the database of websites. 3. What is “search engine optimization”? Who benefits from it? Search engine optimization (SEO) is the intentional activity of affecting the visibility of an e-commerce site or a website in a search engine’s natural (unpaid or organic) search results. It involves editing a page’s content, HTML, metadata, and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. In addition, SEO efforts include promoting a site to increase its number of inbound links. SEO primarily benefits companies with e-commerce sites by making their pages appear toward the top of search engine lists when users query. 4. What things can help Web pages rank higher in the search engine results? Cross-linking between pages of the same website to provide more links to the most important pages may improve its visibility. Writing content that includes frequently searched keyword phrases, so as to be relevant to a wide variety of search queries, will tend to increase traffic. Updating content so as to keep search engines crawling back frequently can give additional weight to a site. Adding relevant keywords to a Web page’s metadata, including the title tag and metadescription, will tend to improve the relevancy of a site’s search listings, thus increasing traffic. URL normalization of Web pages so that they are accessible via multiple URLs and using canonical link elements and redirects can help make sure links to different versions of the URL all count toward the page’s link popularity score

点击进入文档下载页（DOC格式）

共25页，试读已结束，阅读完整版请下载

点击下载（DOC格式）

浏览记录