Introduction to text Mining Thanks for Hongning Wang@UVas slides on Text Ming Courses, Slides are slightly modified by Lei chen
Introduction to Text Mining Thanks for Hongning Wang@UVa’s slides on Text Ming Courses, Slides are slightly modified by Lei Chen
What is"Text Mining"? Text mining also referred to as text data mining roughly equivalent to text analytics, refers to the process of deriving high-quality in formation from text. -wikipedia Another way to view text data mining is as a process of exploratory data analysis that leads to heretofore unknown information, or to answers for questions for which the answer is not currently known. -Hearst, 1999 CSoUVa CS6501: Text Mining
What is “Text Mining”? • “Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text.” - wikipedia • “Another way to view text data mining is as a process of exploratory data analysis that leads to heretofore unknown information, or to answers for questions for which the answer is not currently known.” - Hearst, 1999 CS@UVa CS6501: Text Mining 2
Two different definitions of mining Goal-oriented (effectiveness driven) Any process that generates useful results that are non- obvious is called"mining Keywords: useful+ non-obvious Data isnt necessarily massive Method-oriented (efficiency driven) Any process that involves extracting information from massive data is called"mining Keywords: "massive"+"pattern Patterns aren' t necessarily useful CSoUVa CS6501: Text Mining
Two different definitions of mining • Goal-oriented (effectiveness driven) – Any process that generates useful results that are nonobvious is called “mining”. – Keywords: “useful” + “non-obvious” – Data isn’t necessarily massive • Method-oriented (efficiency driven) – Any process that involves extracting information from massive data is called “mining” – Keywords: “massive” + “pattern” – Patterns aren’t necessarily useful CS@UVa CS6501: Text Mining 3
Text mining around us Sentiment analysis 20 12 RAC E FO R ∥,GMh,%N 5 uT sonERa WE COLLECT 70,000 H Mn Romney wL。。 “心心 WIN SENTIMEN N THESE I。ufcE TO THE DAY ULFORL 3 CNNPOUTICALTICKER-.COMBLOGSGingrichstepsupsupportrRomney,predictMourdockwihsinindiana. CSoUVa CS6501: Text Mining
Text mining around us • Sentiment analysis CS@UVa CS6501: Text Mining 4
Text mining around us Document summarization efficiently m 「0c8 至 wledge technologies a ng otes u u make il ach Tie vision assets CSoUVa CS6501: Text Mining
Text mining around us • Document summarization CS@UVa CS6501: Text Mining 5
Text mining around us Restaurant/hotel recommendation Bodo's Bagels Hilton Times Square a Price Finder Hilton bleb 口= EXCI Anel any octan时 Daces ceea lose的 eaf of danehy ①63m Book on Ctrpdvaor Recommended Reviews 4.919 Reviews from our TripAdvisor Community CSoUVa CS6501: Text Mining
Text mining around us • Restaurant/hotel recommendation CS@UVa CS6501: Text Mining 6
Text mining around us Text analytics in financial services JUNE 6 MAY 18 Stock price JUNE 22 AUGUST 17 Facebook IPO settles at $25 Stock price peaks JULY 31 Facebook sentiment is atS33 Sentiment drops almost neutra shares reach a new low 52.1 pts previous lows MAY 25 of S19 Sentiment JULY 19 shortly sets a new followed low of 22 by stock DCM Facebook Sentiment Facebook Stock Price CSoUVa CS6501: Text Mining
Text mining around us • Text analytics in financial services CS@UVa CS6501: Text Mining 7
How to perform text mining? As computer scientists, we view it as Text Mining Data Mining t Text Data CSoUVa CS6501: Text Mining 8
How to perform text mining? • As computer scientists, we view it as – Text Mining = Data Mining + Text Data CS@UVa CS6501: Text Mining 8
Text mining v.S. NLP IR, DM How does it relate to data mining in general? How does it relate to computational linguistics? How does it relate to information retrieval? Finding Patterns Finding“ Nuggets” Novel Non-Novel General Non-textual data Database data-mining」 Exploratory queres Textual data Comp Text Mining ′s|S Information Ling retrieval CSoUVa CS6501: Text Mining
Text mining v.s. NLP, IR, DM… • How does it relate to data mining in general? • How does it relate to computational linguistics? • How does it relate to information retrieval? Finding Patterns Finding “Nuggets” Novel Non-Novel Non-textual data General data-mining Exploratory data analysis Database queries Textual data Computational Linguistics Information Text Mining retrieval CS@UVa CS6501: Text Mining 9
ext mining in genera Access Serve for Ir Sub-area of applications DM research Mining Filter Discover knowledge information Based on NLP/ML Add techniques organization Structure/Annotations CSoUVa CS6501: Text Mining
Text mining in general CS@UVa CS6501: Text Mining 10 Access Mining Organization Filter information Discover knowledge Add Structure/Annotations Serve for IR applications Based on NLP/ML techniques Sub-area of DM research