Business Intelligence, 4e(sharda/Delen/Turban) Chapter 5 Predictive Analytics I: Text, Web, and social Media analytics 1) Text analytics is the subset of text mining that handles information retrieval and extraction plus data mining Answer: FALSE Diff: 2 Page Ref: 251 2)Categorization and clustering of documents during text mining differ only in the preselection of categories Answer: TRUE Diff: 2 Page Ref: 252 3)Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out Answer: TRUE Diff: 2 Page Ref: 253 4)In the car insurance case study, text mining was used to identify auto features that caused inJuries Answer: FAL Diff: 2 Page Ref: 254-255 5)Regional accents present challenges for natural language processing Answer: TRUE Diff: 2 Page Ref: 256 6)In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for Answer: TRUE Diff: 2 Page Ref: 306 7)In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users Answer: TRUE Diff: 2 Page Ref: 278 8)In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document Answer: TRUE Diff: 2 Page Ref: 272 9)In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings Answer: FALSE Diff: 2 Page Ref: 276 Copyright C 2018 Pearson Education, Inc
1 Copyright © 2018 Pearson Education, Inc. Business Intelligence, 4e (Sharda/Delen/Turban) Chapter 5 Predictive Analytics II: Text, Web, and Social Media Analytics 1) Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining. Answer: FALSE Diff: 2 Page Ref: 251 2) Categorization and clustering of documents during text mining differ only in the preselection of categories. Answer: TRUE Diff: 2 Page Ref: 252 3) Articles and auxiliary verbs are assigned little value in text mining and are usually filtered out. Answer: TRUE Diff: 2 Page Ref: 253 4) In the car insurance case study, text mining was used to identify auto features that caused injuries. Answer: FALSE Diff: 2 Page Ref: 254-255 5) Regional accents present challenges for natural language processing. Answer: TRUE Diff: 2 Page Ref: 256 6) In the Tito's Vodka case study, trends in cocktails were studied to create a quarterly recipe for customers. Answer: TRUE Diff: 2 Page Ref: 306 7) In the Wimbledon case study, designers balanced the needs of mobile and desktop computer users. Answer: TRUE Diff: 2 Page Ref: 278 8) In text mining, if an association between two concepts has 7% support, it means that 7% of the documents had both concepts represented in the same document. Answer: TRUE Diff: 2 Page Ref: 272 9) In sentiment analysis, sentiment suggests a transient, temporary opinion reflective of one's feelings. Answer: FALSE Diff: 2 Page Ref: 276
0) Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment Answer: TRUE Diff: 2 Page Ref: 276 I 1) In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way Answer: TRUE Diff: 2 Page Ref: 276 12)Search engines are only used in the context of the World wide Web(www) Answer: FALSE Diff: 2 Page Ref: 291 13)Search engine optimization(SEO)techniques play a minor role in a Web site's search ranking because only well-written content matters Answer: FALSE Diff: 2 Page Ref: 294-295 14)Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences Ansy Diff: 2 Page Ref: 299 15) Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors Answer FALSE Diff: 2 Page Ref: 303 16 Web-based media has nearly identical cost and scale structures as traditional media Answer: FALSE Diff: 2 Page Ref: 309 17)Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing Answer: FALSE Diff: 2 Page Ref: 309-310 18) In the evolution of social media user engagement, the largest recent change is the growth of creators Answer: FAL Diff: 2 Page Ref: 310-31 19)Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiment Answer: FALSE Diff: 2 Page Ref: 311 Copyright C 2018 Pearson Education, Inc
2 Copyright © 2018 Pearson Education, Inc. 10) Current use of sentiment analysis in voice of the customer applications allows companies to change their products or services in real time in response to customer sentiment. Answer: TRUE Diff: 2 Page Ref: 276 11) In sentiment analysis, it is hard to classify some subjects such as news as good or bad, but easier to classify others, e.g., movie reviews, in the same way. Answer: TRUE Diff: 2 Page Ref: 276 12) Search engines are only used in the context of the World Wide Web (WWW). Answer: FALSE Diff: 2 Page Ref: 291 13) Search engine optimization (SEO) techniques play a minor role in a Web site's search ranking because only well-written content matters. Answer: FALSE Diff: 2 Page Ref: 294-295 14) Clickstream analysis does not need users to enter their perceptions of the Web site or other feedback directly to be useful in determining their preferences. Answer: TRUE Diff: 2 Page Ref: 299 15) Since little can be done about visitor Web site abandonment rates, organizations have to focus their efforts on increasing the number of new visitors. Answer: FALSE Diff: 2 Page Ref: 303 16) Web-based media has nearly identical cost and scale structures as traditional media. Answer: FALSE Diff: 2 Page Ref: 309 17) Consistent high quality, higher publishing frequency, and longer time lag are all attributes of industrial publishing when compared to Web publishing. Answer: FALSE Diff: 2 Page Ref: 309-310 18) In the evolution of social media user engagement, the largest recent change is the growth of creators. Answer: FALSE Diff: 2 Page Ref: 310-311 19) Descriptive analytics for social media feature such items as your followers as well as the content in online conversations that help you to identify themes and sentiments. Answer: FALSE Diff: 2 Page Ref: 311
20)Companies understand that when their product goes"viral, " the content of the online conversations about their product does not matter only the volume of conversations Answer: FALSE Diff: 3 Page Ref: 312 21)In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT A)massive parallelism to enable simultaneous consideration of multiple hypotheses B)an underlying confidence subsystem that ranks and integrates answers C)a core engine that could operate seamlessly in another domain without changes D)integration of shallow and deep knowledge Answer: C Diff: 3 Page Ref: 248-250 22)In text mining, tokenizing is the process of A)categorizing a block of text in a sentence B)reducing multiple words to their base or root C)transforming the term-by-document matrix to a manageable size D) creating new branches or stems of recorded paragraphs Answer:A Diff: 2 Page Ref: 253 23)All of the following are challenges associated with natural language processing EXCEPT A)dividing up a text into individual words in English B)understanding the context in which something is said C)distinguishing between words that have more than one meaning D)recognizing typographical or grammatical errors in texts Answer:A Diff: 3 Page Ref: 256 24) Natural language processing(NLP)is associated with which of the following areas A)text mining B)artificial intelligence C)computational linguistics D)all of these Answer: D Diff: 2 Page Ref: 256 25)In the research literature case study, the researchers analyzing academic papers extracted information from which source? a)the paper abstract B)the paper keywords C) the main body of the paper D) the paper references Answer:A Diff: 1 Page Ref: 273-274 Copyright C 2018 Pearson Education, Inc
3 Copyright © 2018 Pearson Education, Inc. 20) Companies understand that when their product goes "viral," the content of the online conversations about their product does not matter, only the volume of conversations. Answer: FALSE Diff: 3 Page Ref: 312 21) In the opening vignette, the architectural system that supported Watson used all the following elements EXCEPT A) massive parallelism to enable simultaneous consideration of multiple hypotheses. B) an underlying confidence subsystem that ranks and integrates answers. C) a core engine that could operate seamlessly in another domain without changes. D) integration of shallow and deep knowledge. Answer: C Diff: 3 Page Ref: 248-250 22) In text mining, tokenizing is the process of A) categorizing a block of text in a sentence. B) reducing multiple words to their base or root. C) transforming the term-by-document matrix to a manageable size. D) creating new branches or stems of recorded paragraphs. Answer: A Diff: 2 Page Ref: 253 23) All of the following are challenges associated with natural language processing EXCEPT A) dividing up a text into individual words in English. B) understanding the context in which something is said. C) distinguishing between words that have more than one meaning. D) recognizing typographical or grammatical errors in texts. Answer: A Diff: 3 Page Ref: 256 24) Natural language processing (NLP) is associated with which of the following areas? A) text mining B) artificial intelligence C) computational linguistics D) all of these Answer: D Diff: 2 Page Ref: 256 25) In the research literature case study, the researchers analyzing academic papers extracted information from which source? A) the paper abstract B) the paper keywords C) the main body of the paper D) the paper references Answer: A Diff: 1 Page Ref: 273-274
26) In sentiment analysis, which of the following is an implicit opinion? A)The hotel we stayed in was terrible B) The customer service I got for my TV was laughable C)The cruise we went on last summer was a disaster D) Our new mayor is great for the city Answer: B Diff: 3 Page Ref: 277 27) In the wimbledon case study the tournament used data for each match in real time to highlight A)winners and losers B)player histories C)significant events D)advertiser content Answer: C Diff: 2 Page Ref: 278-280 28)What do voice of the market(VOM) applications of sentiment analysis do? A) They examine customer sentiment at the aggregate level B) They examine employee sentiment in the organization C) They examine the stock market for trends D) They examine the"market of ideas"in politics Answer:A Diff: 3 Page Ref: 281 29)Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to A)use only the single, approved English lexicon B)use any general English lexicon C)use an English lexicon appropriate to the project at your discretion D)create an English lexicon for the project Answer: C Diff: 3 Page Ref: 284-285 30)In text analysis, what is a lexicon? A)a catalog of words, their synonyms, and their meanings B)a catalog of customers, their words, and phrases C)a catalog of letters, words, phrases, and sentences D)a catalog of customers, products, words, and phrases Answer:A Diff: 3 Page Ref: 284 Copyright C 2018 Pearson Education, Inc
4 Copyright © 2018 Pearson Education, Inc. 26) In sentiment analysis, which of the following is an implicit opinion? A) The hotel we stayed in was terrible. B) The customer service I got for my TV was laughable. C) The cruise we went on last summer was a disaster. D) Our new mayor is great for the city. Answer: B Diff: 3 Page Ref: 277 27) In the Wimbledon case study, the tournament used data for each match in real time to highlight A) winners and losers. B) player histories. C) significant events. D) advertiser content. Answer: C Diff: 2 Page Ref: 278-280 28) What do voice of the market (VOM) applications of sentiment analysis do? A) They examine customer sentiment at the aggregate level. B) They examine employee sentiment in the organization. C) They examine the stock market for trends. D) They examine the "market of ideas" in politics. Answer: A Diff: 3 Page Ref: 281 29) Sentiment analysis projects require a lexicon for use. If a project in English is undertaken, you must generally make sure to A) use only the single, approved English lexicon. B) use any general English lexicon. C) use an English lexicon appropriate to the project at your discretion. D) create an English lexicon for the project. Answer: C Diff: 3 Page Ref: 284-285 30) In text analysis, what is a lexicon? A) a catalog of words, their synonyms, and their meanings B) a catalog of customers, their words, and phrases C) a catalog of letters, words, phrases, and sentences D) a catalog of customers, products, words, and phrases Answer: A Diff: 3 Page Ref: 284
31)What types of documents are BESt suited to semantic labeling and aggregation to determine sentiment orientation? A)medium-to large-sized documents B)small-to medium-sized documents D)collections of documents Answer: B Diff: 3 Page Ref: 286 32)What does Web content mining involve? B)analyzing the unstructured content of Web pager sq A)analyzing the universal resource locator in Web page C)analyzing the pattern of visits to a Web site D)analyzing the Page Rank and other metadata of a Web page Answer: B Diff: 2 Page Ref: 289 33) Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called A)preprocessing the documents B)document analysis C) creating the term-by-document matrix D)parsing the documents Answer: D Diff: 3 Page Ref: 293 34)Search engine optimization(SEO) is a means by which A)Web site developers can negotiate better deals for paid ads B)Web site developers can increase Web site search rankings C)Web site developers index their Web sites for search engines D)Web site developers optimize the artistic features of their Web sites answer: B Diff: 2 Page Ref: 294-295 35)What are the two main types of Web analytics? A)old-school and new-school Web analytics B)Bing and Google Web analytics C)off-site and on-site Web analytics D)data-based and subjective Web anal Answer: C Diff:3 Page Ref: 299 Copyright C 2018 Pearson Education, Inc
5 Copyright © 2018 Pearson Education, Inc. 31) What types of documents are BEST suited to semantic labeling and aggregation to determine sentiment orientation? A) medium- to large-sized documents B) small- to medium-sized documents C) large-sized documents D) collections of documents Answer: B Diff: 3 Page Ref: 286 32) What does Web content mining involve? A) analyzing the universal resource locator in Web pages B) analyzing the unstructured content of Web pages C) analyzing the pattern of visits to a Web site D) analyzing the PageRank and other metadata of a Web page Answer: B Diff: 2 Page Ref: 289 33) Breaking up a Web page into its components to identify worthy words/terms and indexing them using a set of rules is called A) preprocessing the documents. B) document analysis. C) creating the term-by-document matrix. D) parsing the documents. Answer: D Diff: 3 Page Ref: 293 34) Search engine optimization (SEO) is a means by which A) Web site developers can negotiate better deals for paid ads. B) Web site developers can increase Web site search rankings. C) Web site developers index their Web sites for search engines. D) Web site developers optimize the artistic features of their Web sites. Answer: B Diff: 2 Page Ref: 294-295 35) What are the two main types of Web analytics? A) old-school and new-school Web analytics B) Bing and Google Web analytics C) off-site and on-site Web analytics D) data-based and subjective Web analytics Answer: C Diff: 3 Page Ref: 299
36)Web site usability may be rated poor if a)the average number of page views on your Web site is large B)the time spent on your Web C)Web site visitors download few of your offered PDFs and videos D)users fail to click on all pages equally Ar nswer Diff: 2 Page Ref: 300 37) Understanding which keywords your users enter to reach your Web site through a search engine can help you understand A)the hardware your Web site is running on B)the type of Web browser being used by your Web site visitors C)most of your Web site visitors' wants and needs D)how well visitors understand your products Answer: D Diff:3 Page Ref: 301 38)Which of the following statements about Web site conversion statistics is FALSE? A)Web site visitors can be classed as either new or returning B)Visitors who begin a purchase on most Web sites must complete it C) The conversion rate is the number of people who take action divided by the number of vISitors D) Analyzing exit rates can tell you why visitors left your Web site answer: B Diff:3 Page Ref: 302 39)What is one major way in which Web-based social media differs from traditional publishing media? A)Most Web-based media are operated by the government and large firms B)They use different languages of publication C)They have different costs to own and operate D)Web-based media have a narrower range of quality Answer: C Diff:3 Page Ref: 310 A)It helps identify your followers B) It identifies links C)It examines the content of online conversations D)It identifies the biggest sources of influence online Answer: C Diff: 2 Page Ref: 31 41)IBM,'s Watson utilizes a massively parallel, text mining-focused, probabilistic evidence- based computational architecture called Answer: DeepQa Diff: 2 Page Ref: 248 Copyright C 2018 Pearson Education, Inc
6 Copyright © 2018 Pearson Education, Inc. 36) Web site usability may be rated poor if A) the average number of page views on your Web site is large. B) the time spent on your Web site is long. C) Web site visitors download few of your offered PDFs and videos. D) users fail to click on all pages equally. Answer: C Diff: 2 Page Ref: 300 37) Understanding which keywords your users enter to reach your Web site through a search engine can help you understand A) the hardware your Web site is running on. B) the type of Web browser being used by your Web site visitors. C) most of your Web site visitors' wants and needs. D) how well visitors understand your products. Answer: D Diff: 3 Page Ref: 301 38) Which of the following statements about Web site conversion statistics is FALSE? A) Web site visitors can be classed as either new or returning. B) Visitors who begin a purchase on most Web sites must complete it. C) The conversion rate is the number of people who take action divided by the number of visitors. D) Analyzing exit rates can tell you why visitors left your Web site. Answer: B Diff: 3 Page Ref: 302 39) What is one major way in which Web-based social media differs from traditional publishing media? A) Most Web-based media are operated by the government and large firms. B) They use different languages of publication. C) They have different costs to own and operate. D) Web-based media have a narrower range of quality. Answer: C Diff: 3 Page Ref: 310 40) What does advanced analytics for social media do? A) It helps identify your followers. B) It identifies links between groups. C) It examines the content of online conversations. D) It identifies the biggest sources of influence online. Answer: C Diff: 2 Page Ref: 311 41) IBM's Watson utilizes a massively parallel, text mining–focused, probabilistic evidencebased computational architecture called ________. Answer: DeepQA Diff: 2 Page Ref: 248
42) also called homonyms, are syntactically identical words with different meanings Answer: Polyseme Diff: 2 Page Ref: 253 43)When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as Answer: word sense disambiguate Diff: 3 Page Ref: 256 44) is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources Answer: Sentiment analysis Diff: 2 Page Ref: 257 45)In the Mining for Lies case study a text based deception-detection method used by fuller and others in 2008 was based on a process known as which relies on elements of dat and text mining techniques Answer: message feature mining Diff: 2 Page Ref: 262-263 46)At a very high level, the text mining process can be broken down into three consecutive tasks the first of which is to establish the Answer: Corpus Diff: 2 Page Ref: 269 47)Because the term document matrix is often very large and rather sparse, an important optimization step is to reduce the of the matrix Answer: dimensionalit Diff: 2 Page Ref: 270 is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer Answer: Voice of the customer(VOC) Diff: 2 Page Ref: 280 49)When viewed as a binary feature classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opInion Answer: polarity Diff: 2 Page Ref: 282 Copyright C 2018 Pearson Education, Inc
7 Copyright © 2018 Pearson Education, Inc. 42) ________, also called homonyms, are syntactically identical words with different meanings. Answer: Polysemes Diff: 2 Page Ref: 253 43) When a word has more than one meaning, selecting the meaning that makes the most sense can only be accomplished by taking into account the context within which the word is used. This concept is known as ________. Answer: word sense disambiguation Diff: 3 Page Ref: 256 44) ________ is a technique used to detect favorable and unfavorable opinions toward specific products and services using large numbers of textual data sources. Answer: Sentiment analysis Diff: 2 Page Ref: 257 45) In the Mining for Lies case study, a text based deception-detection method used by Fuller and others in 2008 was based on a process known as ________, which relies on elements of data and text mining techniques. Answer: message feature mining Diff: 2 Page Ref: 262-263 46) At a very high level, the text mining process can be broken down into three consecutive tasks, the first of which is to establish the ________. Answer: Corpus Diff: 2 Page Ref: 269 47) Because the term document matrix is often very large and rather sparse, an important optimization step is to reduce the ________ of the matrix. Answer: dimensionality Diff: 2 Page Ref: 270 48) ________ is mostly driven by sentiment analysis and is a key element of customer experience management initiatives, where the goal is to create an intimate relationship with the customer. Answer: Voice of the customer (VOC) Diff: 2 Page Ref: 280 49) When viewed as a binary feature, ________ classification is the binary classification task of labeling an opinionated document as expressing either an overall positive or an overall negative opinion. Answer: polarity Diff: 2 Page Ref: 282
50)Web pages contain both unstructured information and which are connections to Web pa Answer: hyperlinks Diff: 1 Page Ref: 290 51)Web are used to automatically read through the contents of Web sites Answer: crawlers/spiders Diff: 1 Page Ref: 289 52)A(n) is one or more Web pages that provide a collection of links to authoritative Web pages Answer: hub Diff: 1 Page Ref: 290 53)A(n) engine is a software program that searches for Web sites or files based on keyword Answer: search Diff: 1 Page Ref: 291 4)In the Lotte. com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which had the most visitors Answer: channels Diff: 2 Page Ref: 297 55) Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site Answer: Off-site Diff:1 page ref: 299 56)A(n) Web site contains links that send traffic directly to your Web site Answer: referral Diff: 2 Page Ref: 301 57) statistics help you understand whether your specific marketing objective for a Web page is being achieved Answer: Conversion Diff: 1 Page Ref: 302 58)In the Tito's Vodka case, it was important that social media users all had a(n) brand experience Answer: consistent Diff: 2 Page Ref: 306 Copyright C 2018 Pearson Education, Inc
8 Copyright © 2018 Pearson Education, Inc. 50) Web pages contain both unstructured information and ________, which are connections to other Web pages. Answer: hyperlinks Diff: 1 Page Ref: 290 51) Web ________ are used to automatically read through the contents of Web sites. Answer: crawlers/spiders Diff: 1 Page Ref: 289 52) A(n) ________ is one or more Web pages that provide a collection of links to authoritative Web pages. Answer: hub Diff: 1 Page Ref: 290 53) A(n) ________ engine is a software program that searches for Web sites or files based on keywords. Answer: search Diff: 1 Page Ref: 291 54) In the Lotte.com retail case, the company deployed SAS for Customer Experience Analytics to better understand the quality of customer traffic on their Web site, classify order rates, and see which ________ had the most visitors. Answer: channels Diff: 2 Page Ref: 297 55) ________ Web analytics refers to measurement and analysis of data relating to your company that takes place outside your Web site. Answer: Off-site Diff: 1 Page Ref: 299 56) A(n) ________ Web site contains links that send traffic directly to your Web site. Answer: referral Diff: 2 Page Ref: 301 57) ________ statistics help you understand whether your specific marketing objective for a Web page is being achieved. Answer: Conversion Diff: 1 Page Ref: 302 58) In the Tito's Vodka case, it was important that social media users all had a(n) ________ brand experience. Answer: consistent Diff: 2 Page Ref: 306
is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close Answer: Propinquity Diff: 1 Page Ref: 308 60) is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network Answer: Cohesion Diff: 1 Page Ref: 309 61) How would you describe information extraction in text mining Diff: 2 Page Ref: 252 62) Natural language processing(NLP), a subfield of artificial intelligence and computational inguistics, is an important component of text mining. What is the definition of NLP? Answer: NLP is a discipline that studies the problem of"understanding"the natural human language, with the view of converting depictions of human language into more formal representations in the form of numeric and symbolic data that are easier for computer programs to manipulate Diff: 2 Page Ref: 256 63)In the security domain, one of the largest and most prominent text mining applications is the highly classified eChelon surveillance system. What is eChelon assumed to be capable of Answer: Identifying the content of telephone calls, faxes, e-mails, and other types of data and intercepting information sent via satellites, public switched telephone networks, and microwave links Diff: 2 Page Ref: 261-262 64 ) Describe the query-specific clustering method as it relates to clustering Answer: This method employs a hierarchical clustering approach where the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less similar documents, creating a spectrum of relevance levels among the documents Diff: 3 Page Ref: 272 9 Copyright C 2018 Pearson Education, Inc
9 Copyright © 2018 Pearson Education, Inc. 59) ________ is a connections metric for social networks that measures the ties that actors in a network have with others that are geographically close. Answer: Propinquity Diff: 1 Page Ref: 308 60) ________ is a segmentation metric for social networks that measures the strength of the bonds between actors in a social network. Answer: Cohesion Diff: 1 Page Ref: 309 61) How would you describe information extraction in text mining? Answer: Information extraction is the identification of key phrases and relationships within text by looking for predefined objects and sequences in text by way of pattern matching. Diff: 2 Page Ref: 252 62) Natural language processing (NLP), a subfield of artificial intelligence and computational linguistics, is an important component of text mining. What is the definition of NLP? Answer: NLP is a discipline that studies the problem of "understanding" the natural human language, with the view of converting depictions of human language into more formal representations in the form of numeric and symbolic data that are easier for computer programs to manipulate. Diff: 2 Page Ref: 256 63) In the security domain, one of the largest and most prominent text mining applications is the highly classified ECHELON surveillance system. What is ECHELON assumed to be capable of doing? Answer: Identifying the content of telephone calls, faxes, e-mails, and other types of data and intercepting information sent via satellites, public switched telephone networks, and microwave links Diff: 2 Page Ref: 261-262 64) Describe the query-specific clustering method as it relates to clustering. Answer: This method employs a hierarchical clustering approach where the most relevant documents to the posed query appear in small tight clusters that are nested in larger clusters containing less similar documents, creating a spectrum of relevance levels among the documents. Diff: 3 Page Ref: 272
65 )Identify, with a brief description, each of the four steps in the sentiment analysis process Answer: 1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which may be viewed as classification of text as objective or subjective 2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities 3. Target Identification: The goal of this step is to accurately identify the target of the expressed sentiment 4. Collection and Aggregation: In this step all text data points in the document are aggregated and converted to a single sentiment measure for the whole document Diff:2 Page Ref: 282-284 66) In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining? Answer The Web is too big for effective data mining. The Web is so large and growing so rapidly that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making data collection and integration a challenge The Web is too complex. The complexity of a Web page is far greater than a page in a traditional text document collection. Web pages lack a unified structure. They contain far more authoring style and content variation than any set of books, articles, or other traditional text- based document The Web is too dynamic. The Web is a highly dynamic information source. Not only does the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock market results, weather reports, sports scores, prices, company advertisements, and numerous other types of information are updated regularly on the Web The Web is not specific to a domain The Web serves a broad diversity of communities and onnects billions of workstations. Web users have very different backgrounds, interests, and usage purposes. Most users may not have good knowledge of the structure of the information network and may not be aware of the heavy cost of a particular search that they perform The Web has everything. Only a small portion of the information on the Web is truly relevant or useful to someone(or some task). Finding the portion of the Web that is truly relevant to a person and the task being performed is a prominent issue in Web-related research Diff: 2 Page Ref: 287-288 Copyright C 2018 Pearson Education, Inc
10 Copyright © 2018 Pearson Education, Inc. 65) Identify, with a brief description, each of the four steps in the sentiment analysis process. Answer: 1. Sentiment Detection: Here the goal is to differentiate between a fact and an opinion, which may be viewed as classification of text as objective or subjective. 2. N-P Polarity Classification: Given an opinionated piece of text, the goal is to classify the opinion as falling under one of two opposing sentiment polarities, or locate its position on the continuum between these two polarities. 3. Target Identification: The goal of this step is to accurately identify the target of the expressed sentiment. 4. Collection and Aggregation: In this step all text data points in the document are aggregated and converted to a single sentiment measure for the whole document. Diff: 2 Page Ref: 282-284 66) In what ways does the Web pose great challenges for effective and efficient knowledge discovery through data mining? Answer: • The Web is too big for effective data mining. The Web is so large and growing so rapidly that it is difficult to even quantify its size. Because of the sheer size of the Web, it is not feasible to set up a data warehouse to replicate, store, and integrate all of the data on the Web, making data collection and integration a challenge. • The Web is too complex. The complexity of a Web page is far greater than a page in a traditional text document collection. Web pages lack a unified structure. They contain far more authoring style and content variation than any set of books, articles, or other traditional textbased document. • The Web is too dynamic. The Web is a highly dynamic information source. Not only does the Web grow rapidly, but its content is constantly being updated. Blogs, news stories, stock market results, weather reports, sports scores, prices, company advertisements, and numerous other types of information are updated regularly on the Web. • The Web is not specific to a domain. The Web serves a broad diversity of communities and connects billions of workstations. Web users have very different backgrounds, interests, and usage purposes. Most users may not have good knowledge of the structure of the information network and may not be aware of the heavy cost of a particular search that they perform. • The Web has everything. Only a small portion of the information on the Web is truly relevant or useful to someone (or some task). Finding the portion of the Web that is truly relevant to a person and the task being performed is a prominent issue in Web-related research. Diff: 2 Page Ref: 287-288