正在加载图片...
Weibo grant us the opportunity to access its APIs with the is the most joyful day, Mar. ll is the saddest day, Jul. 13 application-level. For the limitation of requesting the aPI, is the most disgusting day and Jul. 24 is the angriest day it is necessary to select some probes from Weibo and then (refer to 6 for the link). MoodLens also draws the distri. collect data from them. From a large-scale user-pool we bution graph of everyday sentiment for each region of China llected before 2011. which contains more than 2.2 million and show how the sentiment evolves day by day dynamically users,MoodLens randomly selects 6, 800 active users, 200 (refer to 7 for the link users for each province or region in China. Here the "active Abnormal Event Detection Intuitively, abnormal events users"means these users should be true users but not spam. in the real world would definitely affect the people's emotion A simple filtering rule is used to filter them out, which and then the mental change would be reflected by the tweets hat MoodLens only chooses the user with more than 200 people publish. The basic idea of the detection method is but less than 3, 000 followers and has published less than first to find the turning point in the variation of the sen- 3.000 tweet timent and then to extract information of event from the tweets. MoodLens defines a sequence of fraction for the sen- timent ci as S '), where t is the observing time, its unit is likely to be a day or an hour. Assuming we observe the variation of the sentiment from t= ti to t= t2, then the averaged fraction tweets in ci could be defined as ◆◆鲁◆4◆ (St,_t)= (a)Hourly pattern b)Weekly pattern here At= t2-ti is the time window of observation. Hence MoodLens could get the sequence of relative variation for c V3= Then MoodLens defines the sequence of sentiment variation ing order and the top-k t is selected as the outlier time points, denoted as (t1, t2,, t). For each ti, MoodLens could extract the tweets posted at that time and perform the in- formation extraction for the event. Here MoodLens employs Figure 3: Temporal sentiment pattern the simplest way, the top 5 bi-gram terms of high frequency would be extracted to depict the event happened. We per entiment patterns The hou ttern of the senti- form this method on the data set of 2011 and find it coule ment is showed in Figure 3(a). It could be found that the detect almost all the abnormal events happened during the time from 6: 00 AM to 8: 00 AM is the saddest moment, which whole year. As shown in Figure 4, we mark the top-10 is different from the recent study from Twitter 5. In their events detected from A to J. In these detected events, A, D data set, they found people are likely to be positive at early and E correspond to the event of bullet train crash. C and morning. While for Weibo, this period is also the angriest B correspond to the fact that Japan was hit by a magnitude moment for most of the users during a day. Surprised by this 9.0 earthquake, while corresponds to the news that people difference, we carefully investigate the tweets published inin China rushed to purchase the salt because of rumors. It Weibo from 6: 00AM to 8: 00AM and extract the commonly should be noted that for j, the fraction of angry is larger sed words. And the results show that the "sad"mood is than C and B. F corresponds to New Years' Eve of 2011 generally caused by the followed reasons. First, some people G and I correspond to Spring Festival of 2011. H corre- hate to get up early, but they have to. Second, some users sponds to the death of Steven Jobs. It is also interesting do not want to work at this time. Third, some ones might that different from C, B,J, A, D and E, for H, although have nightmare in the last night and have bad sleep. After the fraction of sad is high but the fraction of angry is low 10: 00 AM, users of Weibo seems to become more and more It indicates that detailed negative sentiments are useful for ful gradually and the fraction of joyful tweets reaches the analyzing the essence of the abnormal event eak at 20: 00 PM. The weekly pattern of th e sentiment Real-time Sentiment Monitoring The NB classifier showed in Figure 3(b). As can be seen, people seems to with incremental learning is speedy enough for the real-time ecome happier since Friday, the joyful reaches peak at sa sentiment analysis of the tweets in Weibo. Through the API urday and then the mood of joy goes down as sunday begi ovided by Weibo, MoodLens could obtain the most recent As shown in Figure 3(c)is the monthly pattern of the senti- 400 public tweets every minute and these tweets could be ment. There are several outliers. For instance, in March of analyzed in less than one second. In order to guarantee the 2011. it shows the f Weibo are sad and angry, it might statistical significance, we set the cycle of collecting tweets a be caused by the earthquake in Japan and the rumor that 30 minutes, which means MoodLens would download nearly the iodized salt in China is also nuclear polluted. Another 12, 000 tweets in one monitoring cycle. Then these twee one is in July, the fraction of angry reaches the peak, it is would be categorized into different sentiment classes in less mainly related with the accident of the bullet train. We also than 1 minute. As shown in Figure 5, we present a sample find the days of extreme sentiment in 2011. Namely, Jan. 1 cycle from the real-time monitoring, which starts from 19: 30Weibo grant us the opportunity to access its APIs with the application-level. For the limitation of requesting the API, it is necessary to select some probes from Weibo and then collect data from them. From a large-scale user-pool we collected before 2011, which contains more than 2.2 million users, MoodLens randomly selects 6,800 active users, 200 users for each province or region in China. Here the “active users” means these users should be true users but not spam. A simple filtering rule is used to filter them out, which is that MoodLens only chooses the user with more than 200 but less than 3,000 followers and has published less than 3,000 tweets. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Hour fraction Angry Disgusting Joyful Sad (a) Hourly pattern. Mon Tue Wed Thu Fir Sat Sun 0.1 0.15 0.2 0.25 0.3 0.35 0.4 fraction Angry Disgusting Joyful Sad (b) Weekly pattern. 0 2 4 6 8 10 12 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Month fraction Angry Disgusting Joyful Sad (c) Monthly pattern. Figure 3: Temporal sentiment patterns. Sentiment Patterns The hourly pattern of the senti￾ment is showed in Figure 3(a). It could be found that the time from 6:00 AM to 8:00 AM is the saddest moment, which is different from the recent study from Twitter [5]. In their data set, they found people are likely to be positive at early morning. While for Weibo, this period is also the angriest moment for most of the users during a day. Surprised by this difference, we carefully investigate the tweets published in Weibo from 6:00AM to 8:00AM and extract the commonly used words. And the results show that the “sad” mood is generally caused by the followed reasons. First, some people hate to get up early, but they have to. Second, some users do not want to work at this time. Third, some ones might have nightmare in the last night and have bad sleep. After 10:00 AM, users of Weibo seems to become more and more joyful gradually and the fraction of joyful tweets reaches the peak at 20:00 PM. The weekly pattern of the sentiment is showed in Figure 3(b). As can be seen, people seems to become happier since Friday, the joyful reaches peak at Sat￾urday and then the mood of joy goes down as Sunday begins. As shown in Figure 3(c) is the monthly pattern of the senti￾ment. There are several outliers. For instance, in March of 2011, it shows the users of Weibo are sad and angry, it might be caused by the earthquake in Japan and the rumor that the iodized salt in China is also nuclear polluted. Another one is in July, the fraction of angry reaches the peak, it is mainly related with the accident of the bullet train. We also find the days of extreme sentiment in 2011. Namely, Jan. 1 is the most joyful day, Mar. 11 is the saddest day, Jul. 13 is the most disgusting day and Jul. 24 is the angriest day (refer to [6] for the link). MoodLens also draws the distri￾bution graph of everyday sentiment for each region of China and show how the sentiment evolves day by day dynamically (refer to [7] for the link). Abnormal Event Detection Intuitively, abnormal events in the real world would definitely affect the people’s emotion, and then the mental change would be reflected by the tweets people publish. The basic idea of the detection method is first to find the turning point in the variation of the sen￾timent and then to extract information of event from the tweets. MoodLens defines a sequence of fraction for the sen￾timent cj as {S cj t }, where t is the observing time, its unit is likely to be a day or an hour. Assuming we observe the variation of the sentiment from t = t1 to t = t2, then the averaged fraction tweets in cj could be defined as hS cj t1→t2 i = 1 t2 − t1 Xt2 t=t1 S cj t , (1) where ∆t = t2−t1 is the time window of observation. Hence, MoodLens could get the sequence of relative variation for cj as V cj t = S cj t − hS cj t1→t2 i hS cj t1→t2 i . (2) Then MoodLens defines the sequence of sentiment variation as { P4 j=1 |V cj t |}. This sequence could be sorted in descend￾ing order and the top − k t is selected as the outlier time points, denoted as {t1, t2, .., tk}. For each ti, MoodLens could extract the tweets posted at that time and perform the in￾formation extraction for the event. Here MoodLens employs the simplest way, the top 5 bi-gram terms of high frequency would be extracted to depict the event happened. We per￾form this method on the data set of 2011 and find it could detect almost all the abnormal events happened during the whole year. As shown in Figure 4, we mark the top − 10 events detected from A to J. In these detected events, A,D and E correspond to the event of bullet train crash. C and B correspond to the fact that Japan was hit by a magnitude 9.0 earthquake, while J corresponds to the news that people in China rushed to purchase the salt because of rumors. It should be noted that for J, the fraction of angry is larger than C and B. F corresponds to New Years’ Eve of 2011. G and I correspond to Spring Festival of 2011. H corre￾sponds to the death of Steven Jobs. It is also interesting that different from C, B, J, A, D and E, for H, although the fraction of sad is high but the fraction of angry is low. It indicates that detailed negative sentiments are useful for analyzing the essence of the abnormal event. Real-time Sentiment Monitoring The NB classifier with incremental learning is speedy enough for the real-time sentiment analysis of the tweets in Weibo. Through the API provided by Weibo, MoodLens could obtain the most recent 400 public tweets every minute and these tweets could be analyzed in less than one second. In order to guarantee the statistical significance, we set the cycle of collecting tweets as 30 minutes, which means MoodLens would download nearly 12,000 tweets in one monitoring cycle. Then these tweets would be categorized into different sentiment classes in less than 1 minute. As shown in Figure 5, we present a sample cycle from the real-time monitoring, which starts from 19:30
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有