Business Intelligence: A Managerial Perspective on Analytics(3rd Edition) INTELLIGENCE A Managerial Perspective on Analytics Chapter 5 Text and Web Analytics EFRAUTI RRAN
Chapter 5: Text and Web Analytics Business Intelligence: A Managerial Perspective on Analytics (3rd Edition)
Learning Objectives Describe text mining and understand the need for text mining Differentiate between text mining, Web mining and data mining Understand the different application areas for text mining Know the process of carrying out a text mining project Understand the different methods to introduce structure to text-based data Continued.) Copynight@ 2014 Pearson Education, Inc Slide 5-2
Copyright © 2014 Pearson Education, Inc. Slide 5- 2 Learning Objectives ▪ Describe text mining and understand the need for text mining ▪ Differentiate between text mining, Web mining, and data mining ▪ Understand the different application areas for text mining ▪ Know the process of carrying out a text mining project ▪ Understand the different methods to introduce structure to text-based data (Continued…)
Learning Objectives Describe Web mining, its objectives, and its benefits Understand the three different branches of web mIning Web content mining Web structure mining Web usage mining Understand the applications of these three mining paradigms Copynight@ 2014 Pearson Education, Inc Slide 5-3
Copyright © 2014 Pearson Education, Inc. Slide 5- 3 Learning Objectives ▪ Describe Web mining, its objectives, and its benefits ▪ Understand the three different branches of Web mining ▪ Web content mining ▪ Web structure mining ▪ Web usage mining ▪ Understand the applications of these three mining paradigms
Opening Vignette Machine Versus Men on Jeopardy! The Story of Watson Situation Problem Watch it on YouTube! Solutionhttps://www.youtube.com/watch?v=ylr1bylou8m Results Answer discuss the case questions Copynight@ 2014 Pearson Education, Inc Slide 5-4
Copyright © 2014 Pearson Education, Inc. Slide 5- 4 Opening Vignette… Machine Versus Men on Jeopardy!: The Story of Watson ▪ Situation ▪ Problem ▪ Solution ▪ Results ▪ Answer & discuss the case questions. Watch it on YouTube! https://www.youtube.com/watch?v=YLR1byL0U8M
Questions for the Opening Vignette 1. What is Watson? What is special about it? What technologies were used in building Watson(both hardware and software)? 3. What are the innovative characteristics of DeepQA architecture that made Watson superior? 4. Why did IBM spend all that time and money to build Watson? Where is the Rol? Copynight@ 2014 Pearson Education, Inc Slide 5-5
Copyright © 2014 Pearson Education, Inc. Slide 5- 5 Questions for the Opening Vignette 1. What is Watson? What is special about it? 2. What technologies were used in building Watson (both hardware and software)? 3. What are the innovative characteristics of DeepQA architecture that made Watson superior? 4. Why did IBM spend all that time and money to build Watson? Where is the ROI?
A High-Level Depiction of IBM Watsons DeepQA Architecture Answer Evidence sources Candidate P Support Deep search answer ence evidence Question generation retrieval scoring ? models Question Query Hypothesis Soft Hypothesis and d Synthesis Final merging analysIs decomposition generation filtering evidence scoring and ranking Hypothesis Soft Hypothesis and generation filtering evidence scoring Answer and confidence Copynight@ 2014 Pearson Education, Inc Slide 5-6
Copyright © 2014 Pearson Education, Inc. Slide 5- 6 A High-Level Depiction of IBM Watson’s DeepQA Architecture Trained models Question analysis Hypothesis generation Query decomposition Soft filtering Hypothesis and evidence scoring Synthesis Final merging and ranking Answer and confidence ... ... ... Hypothesis generation Soft filtering Hypothesis and evidence scoring Answer sources Evidence sources Primary search Candidate answer generation Support evidence retrieval Deep evidence scoring Question 1 2 3 4 5
Text Mining Concepts 85-90 percent of all corporate data is in some kind of unstructured form(e.g, text) Unstructured corporate data is doubling in size every 18 months Tapping into these information sources is not an option, but a need to stay competitive Answer: text mining A semi-automated process of extracting knowledge from unstructured data sources a.k. a text data mining or knowledge discovery in textual databases Copynight@ 2014 Pearson Education, Inc Slide 5-7
Copyright © 2014 Pearson Education, Inc. Slide 5- 7 Text Mining Concepts ▪ 85-90 percent of all corporate data is in some kind of unstructured form (e.g., text) ▪ Unstructured corporate data is doubling in size every 18 months ▪ Tapping into these information sources is not an option, but a need to stay competitive ▪ Answer: text mining ▪ A semi-automated process of extracting knowledge from unstructured data sources ▪ a.k.a. text data mining or knowledge discovery in textual databases
Data Mining versus Text Mining Both seek for novel and useful patterns Both are semi-automated processes Difference is the nature of the data Structured versus unstructured data Structured data: in databases Unstructured data: Word documents. PDF files, text excerpts, XML files, and so on Text mining-first, impose structure to the data. then mine the structured data Copynight@ 2014 Pearson Education, Inc Slide 5-8
Copyright © 2014 Pearson Education, Inc. Slide 5- 8 Data Mining versus Text Mining ▪ Both seek for novel and useful patterns ▪ Both are semi-automated processes ▪ Difference is the nature of the data: ▪ Structured versus unstructured data ▪ Structured data: in databases ▪ Unstructured data: Word documents, PDF files, text excerpts, XML files, and so on ▪ Text mining – first, impose structure to the data, then mine the structured data
Text Mining Concepts Benefits of text mining are obvious, especially in text-rich data environments e.g., law(court orders), academic research(research articles), finance(quarterly reports, medicine(discharge summaries), biology(molecular interactions), technology (patent files), marketing(customer comments), etc Electronic communication records(e.g, Email) Spam filtering Email prioritization and categorization Automatic response generation Copynight@ 2014 Pearson Education, Inc Slide 5-9
Copyright © 2014 Pearson Education, Inc. Slide 5- 9 Text Mining Concepts ▪ Benefits of text mining are obvious, especially in text-rich data environments ▪ e.g., law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology (molecular interactions), technology (patent files), marketing (customer comments), etc. ▪ Electronic communication records (e.g., Email) ▪ Spam filtering ▪ Email prioritization and categorization ▪ Automatic response generation
Text Mining Application Area Information extraction Topic tracking Summarization Categorization Clustering Concept linking Question answering Copynight@ 2014 Pearson Education, Inc Slide 5-10
Copyright © 2014 Pearson Education, Inc. Slide 5- 10 Text Mining Application Area ▪ Information extraction ▪ Topic tracking ▪ Summarization ▪ Categorization ▪ Clustering ▪ Concept linking ▪ Question answering