Section 5.5 Review Questions 1. What are the main steps in the text mining process? See Figure 5.6(p. 222). Text mining entails three tasks Establish the Corpus: Collect and organize the domain-specific unstructured data Create the Term-Document Matrix: Introduce structure to the corpus Extract Knowledge: Discover novel patterns from the T-D matrix 2. What is the reason for normalizing word frequencies? What are the common methods for normalizing word frequencies? The raw indices need to be normalized in order to have a more consistent tdm for further analysis. Common methods are log frequencies, binary frequencies, and inverse document frequenc What is SvD? How is it used in text mining? Singular value decomposition(SVD), which is closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms )to a lower dimensional space, where each consecutive d imension represents the largest degree of variability(between words and documents) possible 4. What are the main knowledge extraction methods from corpus? The main categories of knowledge extraction methods are classification, clustering, association, and trend analysis Section 5.6 Review Questions 1. What is sentiment analysis? How does it relate to text mining Sentiment analysis tries to answer the question, " What do people feel about a certain topic? by digging into opinions of many using a variety of automated tools. It is also known as opinion mining, subjectivity analysis, and appraisal extraction Sentiment analysis shares many characteristics and techniques with text mining However, unlike text mining, which categorizes text by conceptual taxonomies of Copyright C2018 Pearson Education, Inc.7 Copyright © 2018Pearson Education, Inc. Section 5.5 Review Questions 1. What are the main steps in the text mining process? See Figure 5.6 (p. 222). Text mining entails three tasks: • Establish the Corpus: Collect and organize the domain-specific unstructured data • Create the Term–Document Matrix: Introduce structure to the corpus • Extract Knowledge: Discover novel patterns from the T-D matrix 2. What is the reason for normalizing word frequencies? What are the common methods for normalizing word frequencies? The raw indices need to be normalized in order to have a more consistent TDM for further analysis. Common methods are log frequencies, binary frequencies, and inverse document frequencies. 3. What is SVD? How is it used in text mining? Singular value decomposition (SVD), which is closely related to principal components analysis, reduces the overall dimensionality of the input matrix (number of input documents by number of extracted terms) to a lower dimensional space, where each consecutive dimension represents the largest degree of variability (between words and documents) possible. 4. What are the main knowledge extraction methods from corpus? The main categories of knowledge extraction methods are classification, clustering, association, and trend analysis. Section 5.6 Review Questions 1. What is sentiment analysis? How does it relate to text mining? Sentiment analysis tries to answer the question, “What do people feel about a certain topic?” by digging into opinions of many using a variety of automated tools. It is also known as opinion mining, subjectivity analysis, and appraisal extraction. Sentiment analysis shares many characteristics and techniques with text mining. However, unlike text mining, which categorizes text by conceptual taxonomies of