正在加载图片...
the results were not satisfactory.which implies that this as- Digits.All Digits in tweets are replaced with "twitter- sumption is unreasonable.(Kouloumpis,Wilson,and Moore digit” 2011)tries to use some hashtags like"#jobs"as indicators Links.All urls in tweets are replaced with"twitterurl" for objective tweets.However,this assumption is not gen- eral enough because the number of tweets containing spe- ●Stopwords.Stopwords like"the”and“to”are removed. cific hashtags is limited and these tweets'sentiment may be Lower case and Stemming.All words are changed to their biased to certain topics like“jobs”. lower cases and stemmed to terms. Here we present a novel assumption for objective tweets that tweets containing an objective url link is assumed to be Retweets and Duplicates.Retweets and duplicate tweets objective.Based on our observation,we find that urls linking are removed to avoid giving extra weight to these tweets to the picture sites (e.g.,twitpic.com)or video sites (e.g., in training data. youtube.com)are often subjective and other urls like those linking to news articles are usually objective.Hence,if a Evaluation Scheme and Metrics url link doesn't represent pictures or videos,we call it an After removing the retweets or duplicates and setting the objective url link.Based on the above assumption,we build classes to be balanced,we randomly choose 956 tweets for the query"wfilter:links"6 to get the statistics about the polarity classification,including 478 positive tweets and 478 objective class. negative ones.For the subjectivity classification,we also set the classes to be balanced and randomly choose 1948 tweets ESLAM for evaluation,including 974 subjective tweets and 974 ob- After we have estimated the Pa(walc)from manually la- jective (neutral)ones. beled data and P(wilc)from the noisy emoticon data The evaluation schemes for both polarity and subjectivity we can integrate them into the same probabilistic frame- classification are similar.Assume the total number of man- work Pco(wilc).Before combining Pa(wilc)and Pu(wilc). ually labeled tweets,including both training and test data, there's another important step:smoothing P(wilc).Be- is X.Each time we randomly sample the same amount of cause P(wic)is estimated from noisy emoticon data,it can tweets (say Y)for both classes (e.g.,positive and negative) be biased.We adopt Dirichlet smoothing(Zhai and Lafferty for training,and use the rest X-2Y tweets for test.This 2004)to smooth Pu(wilc). random selection and testing is carried out 10 rounds inde- By following the JM smoothing principle(Zhai and Laf- pendently for each unique training set size,and the average ferty 2004),our ESLAM model Peo(wilc)can be computed performance is reported.We perform experiments with dif- as follows: ferent sizes of training set,i.e.,Y is set to different values, such as 32,64.and 128. Pco(lc)=aPa(ilc)+(1-a)Pu(ilc, (1) As in(Go,Bhayani,and Huang 2009)and (Kouloumpis, where a E[0,1]is the combination parameter controlling Wilson,and Moore 2011),we adopt accuracy and F-score the contribution of each component. as our evaluation metrics.Accuracy is a measure of what percentage of test data are correctly predicted,and F-score Experiments is computed by combining precision and recall. Data Set Effect of Emoticons The publicly available Sanders Corpus?is used for evalu- ation.It consists of 5513 manually labeled tweets.These We compare our ESLAM method to the fully supervised tweets were collected with respect to one of the four dif- language model(LM)to verify whether the smoothing with ferent topics(Apple,Google,Microsoft,and Twitter).After emoticons is useful or not.Please note that the fully super- removing the non-English and spam tweets,we have 3727 vised LM uses only the manually labeled data for training tweets left.The detailed information of the corpus is shown while ESLAM integrates both manually labeled data and the in Table 1.As for the noisy emoticon data,theoretically we emoticon data for training.Figure 1 and Figure 2 respec- use all the data existing in Twitter by sampling with its API. tively illustrate the accuracy and F-score of the two methods with different number of manually labeled training data,i.e., 2Y=32.64,128,256,512,768. Table 1:Corpus Statistics From Figure 1 and Figure 2,we can see that as the num- Corpus Positive Negative Neutral #Total ber of manually labeled data increases,the performance of both methods will also increase,which is reasonable because Sanders 570 654 2503 3727 the manually labeled data contain strong discriminative in- formation.Under all the evaluation settings,ESLAM con- We adopt the following strategies to preprocess the data: sistently outperforms the fully supervised LM,in particular Username.Twitter usernames which start with are re- for the settings with small number of manually labeled data. placed with"twitterusername" This implies that the noisy emoticon data do have some use- ful information and our ESLAM can effectively exploit it to 6filter:links means returning tweets containing urls. achieve good performance 7http://www.sananalytics.com/lab/ Figure 3 and Figure 4 demonstrate the accuracy and F- twitter-sentiment/ score of the two methods on subjectivity classification withthe results were not satisfactory, which implies that this as￾sumption is unreasonable. (Kouloumpis, Wilson, and Moore 2011) tries to use some hashtags like “#jobs” as indicators for objective tweets. However, this assumption is not gen￾eral enough because the number of tweets containing spe￾cific hashtags is limited and these tweets’ sentiment may be biased to certain topics like “jobs”. Here we present a novel assumption for objective tweets that tweets containing an objective url link is assumed to be objective. Based on our observation, we find that urls linking to the picture sites (e.g., twitpic.com) or video sites (e.g., youtube.com) are often subjective and other urls like those linking to news articles are usually objective. Hence, if a url link doesn’t represent pictures or videos, we call it an objective url link. Based on the above assumption, we build the query “wif ilter : links” 6 to get the statistics about the objective class. ESLAM After we have estimated the Pa(wi |c) from manually la￾beled data and Pu(wi |c) from the noisy emoticon data, we can integrate them into the same probabilistic frame￾work Pco(wi |c). Before combining Pa(wi |c) and Pu(wi |c), there’s another important step: smoothing Pu(wi |c). Be￾cause Pu(wi |c) is estimated from noisy emoticon data, it can be biased. We adopt Dirichlet smoothing (Zhai and Lafferty 2004) to smooth Pu(wi |c). By following the JM smoothing principle (Zhai and Laf￾ferty 2004), our ESLAM model Pco(wi |c) can be computed as follows: Pco(wi |c) = αPa(wi |c) + (1 − α)Pu(wi |c), (1) where α ∈ [0, 1] is the combination parameter controlling the contribution of each component. Experiments Data Set The publicly available Sanders Corpus7 is used for evalu￾ation. It consists of 5513 manually labeled tweets. These tweets were collected with respect to one of the four dif￾ferent topics (Apple, Google, Microsoft, and Twitter). After removing the non-English and spam tweets, we have 3727 tweets left. The detailed information of the corpus is shown in Table 1. As for the noisy emoticon data, theoretically we use all the data existing in Twitter by sampling with its API. Table 1: Corpus Statistics Corpus # Positive # Negative # Neutral # Total Sanders 570 654 2503 3727 We adopt the following strategies to preprocess the data: • Username. Twitter usernames which start with @ are re￾placed with “twitterusername”. 6 filter:links means returning tweets containing urls. 7http://www.sananalytics.com/lab/ twitter-sentiment/ • Digits. All Digits in tweets are replaced with “twitter￾digit”. • Links. All urls in tweets are replaced with “twitterurl”. • Stopwords. Stopwords like “the” and “to” are removed. • Lower case and Stemming. All words are changed to their lower cases and stemmed to terms. • Retweets and Duplicates. Retweets and duplicate tweets are removed to avoid giving extra weight to these tweets in training data. Evaluation Scheme and Metrics After removing the retweets or duplicates and setting the classes to be balanced, we randomly choose 956 tweets for polarity classification, including 478 positive tweets and 478 negative ones. For the subjectivity classification, we also set the classes to be balanced and randomly choose 1948 tweets for evaluation, including 974 subjective tweets and 974 ob￾jective (neutral) ones. The evaluation schemes for both polarity and subjectivity classification are similar. Assume the total number of man￾ually labeled tweets, including both training and test data, is X. Each time we randomly sample the same amount of tweets (say Y ) for both classes (e.g., positive and negative) for training, and use the rest X − 2Y tweets for test. This random selection and testing is carried out 10 rounds inde￾pendently for each unique training set size, and the average performance is reported. We perform experiments with dif￾ferent sizes of training set, i.e., Y is set to different values, such as 32, 64, and 128. As in (Go, Bhayani, and Huang 2009) and (Kouloumpis, Wilson, and Moore 2011), we adopt accuracy and F-score as our evaluation metrics. Accuracy is a measure of what percentage of test data are correctly predicted, and F-score is computed by combining precision and recall. Effect of Emoticons We compare our ESLAM method to the fully supervised language model (LM) to verify whether the smoothing with emoticons is useful or not. Please note that the fully super￾vised LM uses only the manually labeled data for training while ESLAM integrates both manually labeled data and the emoticon data for training. Figure 1 and Figure 2 respec￾tively illustrate the accuracy and F-score of the two methods with different number of manually labeled training data, i.e., 2Y = 32, 64, 128, 256, 512, 768. From Figure 1 and Figure 2, we can see that as the num￾ber of manually labeled data increases, the performance of both methods will also increase, which is reasonable because the manually labeled data contain strong discriminative in￾formation. Under all the evaluation settings, ESLAM con￾sistently outperforms the fully supervised LM, in particular for the settings with small number of manually labeled data. This implies that the noisy emoticon data do have some use￾ful information and our ESLAM can effectively exploit it to achieve good performance. Figure 3 and Figure 4 demonstrate the accuracy and F￾score of the two methods on subjectivity classification with
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有