正在加载图片...
Mood Lens: An Emoticon-Based sentiment Analysis System for Chinese Tweets Jichang Zhao Junjie Wul Ke xut zhaojichanganlsde.buaaedu.cndonglixp@gmail.comwujjabuaa.edu.cnkexu@nlsde.buaaedu.cn State Key Lab of Software Development Environment, Beihang University tBeijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, School of Economics and Management, Beihang University Corresponding author ABSTRACT General terms Recent years have witnessed the explosive growth of online Measurement, Experimentation social media.Weibo,a Twitter-like online social network in China has attracted more than 300 million users in less than three years, with more than 1000 tweets generated in every Keywords econd. These tweets not only convey the factual informa- Sentiment Analysis; Chinese Short Text; Online Social Me- tion. but also reflect the emotional states of the authors dia: Weibo which are very important for understanding user behaviors. However, a tweet in Weibo is extremely short and the words 1. INTRODUCTION it contains evolve extraordinarily fast. Moreover. the Chi- se corpus of sentiments is still very small, which prevents The development of online social networks has attracted the conventional keyword-based methods from being used enormous Internet users in this decade. They are becoming In light of this, we build a system called MoodLens, which he mainstream online social media for information shar- to our best know ledge is the first system for sentiment anal ing.Twitter(www.twitter.com),amicroblogsitelaunched in 2006, has over 300 million registered users, with over 140 sis of Chinese tweets in Weibo. In MoodLens, 95 emoticons million microblog posts, known as tweets, being published are mapped into four categories of sentiments, i.e. angry disgusting, joyful, and sad, which serve as the class label everydayInChina,Weibo(www.weibo.com),aTwitter of tweets. We then collect over 3.5 million labeled tweet like service launched in 2009. has accumulated more than s the corpus and train a fast Naive Bayes classifier, with 300 millions users in less than three years. Every second an empirical precision of 64. 3%. MoodLens also more than 1000 Chinese tweets are posted in Weibo n incremental learning method to tackle the problem of Each user in the network can be viewed as a social set which publishes and propagates the information through the the sentiment shift and the generation of new words. Using tweets. Therefore, the huge amount of tweets convey com- ral interesting temporal and spatial patterns are observed. plicated signals of the authors and the real-world events, in Also, sentiment variations are well-captured by MoodLens which the sentiment is an essential part. In 3, the authors to effectively detect abnormal events in China. Finally, by argued that the events in the social, political and cultural using the highly efficient Naive Bayes classifier, MoodLens is fields did have a significant effect on the users'mood, which ofMoodlenscanbefoundathttp://goo.gl8ddesiscouldbedetectedbytheirtweetsItwasalsoclaimedin[2] capable of online real-time sentiment monitoring that the stock market even could be predicted by the senti ment analysis of the Twitter users Disclosing the emotions in tweets therefore plays a key Categories and Subject Descriptors role in understanding the user behaviors in social media However, both Twitter and Weibo only allow users to post H 2.8 Database Management: Database Application messages up to 140 characters, which makes the tweets ex- Data Mining: H 3.3 Information Search and Retrieval tremely short, and the sentiment analysis therefore becomes Text Mining;J. 4 Social and Behavioral Sciences: [Mis- a very challenging task. In particular, few works have been ellaneous done to reveal how to perform sentiment analysis for Chinese In light of this, we propose a system called MoodLens to perform the sentiment analysis for Chinese Weibo. The main Permission to make digital or hard copies of all or part of this work for contributions lie in the following aspects:(a)Moodlens er personal or classroom use is granted without fee provided that copies are ploys an emoticon-based method for sentiment classification, not made or distributed for profit or commercial advantage and that copies which helps to tackle the longstanding sparsity problem of bear this notice and the full citation on the first page. To copy otherwise, to hort texts;(b) MoodLens can detect four types of senti- nents: angry, disgusting, joyful, and sad, which goes be- KDD12 12-16.2012,Bej yond the traditional binary sentiment(positive vs. negative) Copyright2012ACM978-14503-1462-6/2/08….s500 analysis studies, and is crucial for unveiling the abundantMoodLens: An Emoticon-Based Sentiment Analysis System for Chinese Tweets Jichang Zhao∗ Li Dong∗ Junjie Wu† Ke Xu∗‡ zhaojichang@nlsde.buaa.edu.cn donglixp@gmail.com wujj@buaa.edu.cn kexu@nlsde.buaa.edu.cn ∗State Key Lab of Software Development Environment, Beihang University †Beijing Key Laboratory of Emergency Support Simulation Technologies for City Operations, School of Economics and Management, Beihang University ‡Corresponding author ABSTRACT Recent years have witnessed the explosive growth of online social media. Weibo, a Twitter-like online social network in China, has attracted more than 300 million users in less than three years, with more than 1000 tweets generated in every second. These tweets not only convey the factual informa￾tion, but also reflect the emotional states of the authors, which are very important for understanding user behaviors. However, a tweet in Weibo is extremely short and the words it contains evolve extraordinarily fast. Moreover, the Chi￾nese corpus of sentiments is still very small, which prevents the conventional keyword-based methods from being used. In light of this, we build a system called MoodLens, which to our best knowledge is the first system for sentiment anal￾ysis of Chinese tweets in Weibo. In MoodLens, 95 emoticons are mapped into four categories of sentiments, i.e. angry, disgusting, joyful, and sad, which serve as the class labels of tweets. We then collect over 3.5 million labeled tweets as the corpus and train a fast Na¨ıve Bayes classifier, with an empirical precision of 64.3%. MoodLens also implements an incremental learning method to tackle the problem of the sentiment shift and the generation of new words. Using MoodLens for real-time tweets obtained from Weibo, sev￾eral interesting temporal and spatial patterns are observed. Also, sentiment variations are well-captured by MoodLens to effectively detect abnormal events in China. Finally, by using the highly efficient Na¨ıve Bayes classifier, MoodLens is capable of online real-time sentiment monitoring. The demo of MoodLens can be found at http://goo.gl/8DQ65. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications— Data Mining; H.3.3 [Information Search and Retrieval]: [Text Mining]; J.4 [Social and Behavioral Sciences]: [Mis￾cellaneous] Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’12, August 12–16, 2012, Beijing, China. Copyright 2012 ACM 978-1-4503-1462-6 /12/08 ...$5.00. General Terms Measurement, Experimentation Keywords Sentiment Analysis; Chinese Short Text; Online Social Me￾dia; Weibo 1. INTRODUCTION The development of online social networks has attracted enormous Internet users in this decade. They are becoming the mainstream online social media for information shar￾ing. Twitter (www.twitter.com), a microblog site launched in 2006, has over 300 million registered users, with over 140 million microblog posts, known as tweets, being published every day. In China, Weibo (www.weibo.com), a Twitter￾like service launched in 2009, has accumulated more than 300 millions users in less than three years. Every second, more than 1000 Chinese tweets are posted in Weibo. Each user in the network can be viewed as a social sensor, which publishes and propagates the information through the tweets. Therefore, the huge amount of tweets convey com￾plicated signals of the authors and the real-world events, in which the sentiment is an essential part. In [3], the authors argued that the events in the social, political and cultural fields did have a significant effect on the users’ mood, which could be detected by their tweets. It was also claimed in [2] that the stock market even could be predicted by the senti￾ment analysis of the Twitter users. Disclosing the emotions in tweets therefore plays a key role in understanding the user behaviors in social media. However, both Twitter and Weibo only allow users to post messages up to 140 characters, which makes the tweets ex￾tremely short, and the sentiment analysis therefore becomes a very challenging task. In particular, few works have been done to reveal how to perform sentiment analysis for Chinese tweets in Weibo. In light of this, we propose a system called MoodLens to perform the sentiment analysis for Chinese Weibo. The main contributions lie in the following aspects: (a) Moodlens em￾ploys an emoticon-based method for sentiment classification, which helps to tackle the longstanding sparsity problem of short texts; (b) MoodLens can detect four types of senti￾ments: angry, disgusting, joyful, and sad, which goes be￾yond the traditional binary sentiment (positive vs. negative) analysis studies, and is crucial for unveiling the abundant
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有