Business Intelligence, Analytics, and Data Science: A Managerial Perspective Fourth Edition BUSINESS INTELLIGENCE ANALYTICS Chapter 5 AND DATA SCIENCE Predictive Analytics I: Text A Managerial Web and Social Media Analytics Ramesh Sharda Dursun Delen Efraim Turban PEarson Pearson Copyright 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Business Intelligence, Analytics, and Data Science: A Managerial Perspective Fourth Edition Chapter 5 Predictive Analytics II: Text, Web, and Social Media Analytics … Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives (1 of2 5.1 Describe text mining and understand the need for text mining 5.2 Differentiate among text analytics, text mining, and data mIning 5. 3 Understand the different application areas for teXt mInIng 5. 4 Know the process of carrying out a text mining project 5. 5 Appreciate the different methods to introduce structure to text-based data Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Learning Objectives (1 of 2) 5.1 Describe text mining and understand the need for text mining 5.2 Differentiate among text analytics, text mining, and data mining 5.3 Understand the different application areas for text mining 5.4 Know the process of carrying out a text mining project 5.5 Appreciate the different methods to introduce structure to text-based data
Learning Objectives (2 of 2) 5.6 Describe sentiment analysis 5.7 Develop familiarity with popular applications of sentiment analysis 5.8 Learn the common methods for sentiment analysis 5.9 Become familiar with speech analytics as it relates to sentiment analysis Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Learning Objectives (2 of 2) 5.6 Describe sentiment analysis 5.7 Develop familiarity with popular applications of sentiment analysis 5.8 Learn the common methods for sentiment analysis 5.9 Become familiar with speech analytics as it relates to sentiment analysis
Opening vignette (I of3 Machine Versus Men on jeopardy l the story of Watson I BM Watson going head-to-head with the best of the best in Jeopardy! IN Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Opening Vignette (1 of 3) Machine Versus Men on Jeopardy!: The Story of Watson • IBM Watson going head-to-head with the best of the best in Jeopardy!
Opening Vignette (2 of3 IBM Watson- How does it do it? A Evide souTO sourCeS Question On natura Prirnary Candidate Support Deep language] evidence evidence search generation retrieval scoring Question ftranslation Analysis Evidence Hypothesis 1 Soft Synthesis Merging and to digitall [decomposition filtering scorIng [combining) ranking " Hypothesis 2 -Soft filtering scoring Answer and confidence Hypothesis Evidence scoring Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Opening Vignette (2 of 3) • IBM Watson – How does it do it?
Opening Vignette (3 of 3) Discussion Questions for the Opening vignette 1. What is Watson? What is special about it? 2. What technologies were used in building Watson (both hardware and software)? 3. What are the innovative characteristics of deep Qa architecture that made Watson superior? 4. Why did BM spend all that time and money to build Watson? Where is the return on investment(ROD? Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Opening Vignette (3 of 3) Discussion Questions for the Opening Vignette 1. What is Watson? What is special about it? 2. What technologies were used in building Watson (both hardware and software)? 3. What are the innovative characteristics of DeepQA architecture that made Watson superior? 4. Why did IBM spend all that time and money to build Watson? Where is the return on investment (ROI)?
Text Analytics and Text Mining(1 of 2) Text Analytics versus Text Mining Text Analytics Information Retrieval Information Extraction Data Mining Web Mining or simpl Text Analytics Information Retrieval Text Mining Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Text Analytics and Text Mining (1 of 2) • Text Analytics versus Text Mining • Text Analytics = – Information Retrieval + – Information Extraction + – Data Mining + – Web Mining or simply Text Analytics = Information Retrieval + Text Mining
Text Analytics and Text Mining (2 of 2) Figure 5.2 Text Analytics, Related Application Areas, and Enabling Disciplines TEXT ANALYTICS I Document Matching Web Content Mining Link Analysis Information H-------sRetrieval Text Minin Web Structure Mining Search Engines “ Knowledge Discovery in Web Usage Mining Textua D Statistics Management Science Artificial Intelligence Computer Science Other Disciplines Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Text Analytics and Text Mining (2 of 2) • Figure 5.2 Text Analytics, Related Application Areas, and Enabling Disciplines
Text Mining Concepts (1 of2) 85-90 percent of all corporate data is in some kind of unstructured form(e.g, text Unstructured corporate data is doubling in size every 18 months Tapping into these information sources is not an option, but a need to stay competitive ° Answer: text mining A semi-automated process of extracting knowledge from unstructured data sources a k a. text data mining or knowledge discovery in textual databases Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Text Mining Concepts (1 of 2) • 85-90 percent of all corporate data is in some kind of unstructured form (e.g., text) • Unstructured corporate data is doubling in size every 18 months • Tapping into these information sources is not an option, but a need to stay competitive • Answer: text mining – A semi-automated process of extracting knowledge from unstructured data sources – a.k.a. text data mining or knowledge discovery in textual databases
Data Mining Versus Text Mining Both seek for novel and useful patterns Both are semi-automated processes Difference is the nature of the data Structured versus unstructured data Structured data: in databases Unstructured data: word documents Pdf files text excerpts, XML files, and so on To perform text mining--first, impose structure to the data. then mine the structured data Pearson Copyright C 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2014, 2011 Pearson Education, Inc. All Rights Reserved Data Mining Versus Text Mining • Both seek for novel and useful patterns • Both are semi-automated processes • Difference is the nature of the data: – Structured versus unstructured data – Structured data: in databases – Unstructured data: Word documents, PDF files, text excerpts, XML files, and so on • To perform text mining – first, impose structure to the data, then mine the structured data