4.1 Evaluation measures Two popular m_中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：Learning to Classify Text Using Positive

正在加载图片...

4.1 Evaluation measures to show the accuracy results(as noted earlier, accuracy does Two popular measures used for evaluating text classification not fully reflect the performance of our system) are the F value and breakeven point F value is defined as, F Table 2: Experiment results for a= 15% 2pr/(p+r), where p is the precision and r is the recall. F value measures the performance of a system on a particular clas 上平 (in our case, the positive class). Breakeven point is the value at which recall and precision are equal. However, breakeven come Fa ing order of class probabilities of documents. It does not give indication of classification performance. F value, on the other interest hand,reflects an average effect of both precision and recall. ship1 o7D tive documents it is undesirable to have either too small a whea□圖鸚圖 precision or too small a recall LosBa og7o o73d ogre ooo o923 o.7e og7s o7ge osel In our experimental results, we also report accuracies. It however, should be noted that accuracy does not fully reflect Table 3: Experiment results for a=45% the performance of our system, as our datasets has a large45% roportion of negative documents. We believe this reflects realistic situations. In such cases, the accuracy can be higl but few positive documents may be identified 4.2 Experimental result and Roc-Clu-sVM to denote the classification techniques trade that employ rocchio and Rocchio with clustering to extract whea reliable negative set respectively(both methods use SVM for Avg Lo.72 oge 0.745 o972 a78 0982 0.785 og76 a7gal classifier building). We observe from our experiments that using Rocchio and Rocchio with clustering alone for classi- fication do not provide good classification results. SVM improves the results significantly For comparison, we include the classification results of NB. S-EM and PEBL. Here. NB treats all the documents in 06 the unlabeled set as negative. SVM for the noisy situation(U 0.5 x.-NB as negative)performs poorly because S VM does not tolerate 0.4 noise well. Thus, its results are not listed (in many cases, its F 亠-PEBL 0.3 values are close to O). In both roc-sVM and Roc-Clu-SVM -or-.Roc-SVM we used the linear SVM as [ Yang Liu, 1999] reports that Roc-Ch-SVN linear SVM gives slightly better results than non-linear models on the Reuters dataset. For our experiments, we im- plemented PEBL as it is not publicly available For SVM,we used the SVm"ight system [Joachims, 1999. PEBL also used SVM ght. S-EM is our earlier system. It is publicly available Figure 5 Average results for all a settings athttp://www.cs.uic.edu/-liub/s-em/s-em-download.html From Figure 5, we can draw the following conclusions Table 2 shows the classification results of various tech- 1. S-EM's results are quite consistent under different set niques in terms of f value and accuracy(Acc)for a= 15% tings. However, its results are worse than Roc-SVM and the positive set is small). The final row of the table gives the Roc-Clu-SVM. The reason is that the negative documents average result of each column. We used 10 clusters (i.e, k extracted from U by its spy technique are not that reliable 0) for k-means clustering in Roc-Clu-SVM (later we will and s-eM uses a weaker classifier. NB ee that the number of clusters does not matter much) s results are extremely poor when the number of We observe that Roc-Clu-SVM produces better results positive documents is small. We believe that this is be than Roc-SVM. Both Roc-SVM and Roc-Clu-SVM outper- cause its strategy of extracting the initial set of strong form NB, S-EM and PEBL PEBL is extremely poor in this negative documents could easily go wrong without suf- case. In fact, PEBL performs poorly when the number of ficient positive data. Even when the number of positive sitive documents is small. When the number of positive documents is large, it may also go wrong. For example documents is large, it usually performs well(see Table 3 with for a=55%, one F value (for the dataset, trade)is only 45%) Both Roc-SVM and Roc-Clu-SVM perform well 0. 073. This shows that pebl is not robust consistently. We summarize the average F value results of all 3. Both Roc-SVM and Roc-Clu-SVM are robust with dif- a settings in Figure 5. Due to space limitations, we are unable ferent numbers of positive documents. This is important4.1 Evaluation measures Two popular measures used for evaluating text classification are the F value and breakeven point. F value is defined as, F = 2pr/(p+r), where p is the precision and r is the recall. F value measures the performance of a system on a particular class (in our case, the positive class). Breakeven point is the value at which recall and precision are equal. However, breakeven point is not suitable for our task as it only evaluates the sorting order of class probabilities of documents. It does not give indication of classification performance. F value, on the other hand, reflects an average effect of both precision and recall. This is suitable for our purpose, as we want to identify positive documents. It is undesirable to have either too small a precision or too small a recall. In our experimental results, we also report accuracies. It, however, should be noted that accuracy does not fully reflect the performance of our system, as our datasets has a large proportion of negative documents. We believe this reflects realistic situations. In such cases, the accuracy can be high, but few positive documents may be identified. 4.2 Experimental results We now present the experimental results. We use Roc-SVM and Roc-Clu-SVM to denote the classification techniques that employ Rocchio and Rocchio with clustering to extract reliable negative set respectively (both methods use SVM for classifier building). We observe from our experiments that using Rocchio and Rocchio with clustering alone for classification do not provide good classification results. SVM improves the results significantly. For comparison, we include the classification results of NB, S-EM and PEBL. Here, NB treats all the documents in the unlabeled set as negative. SVM for the noisy situation (U as negative) performs poorly because SVM does not tolerate noise well. Thus, its results are not listed (in many cases, its F values are close to 0). In both Roc-SVM and Roc-Clu-SVM, we used the linear SVM as [Yang & Liu, 1999] reports that linear SVM gives slightly better results than non-linear models on the Reuters dataset. For our experiments, we implemented PEBL as it is not publicly available. For SVM, we used the SVMlight system [Joachims, 1999]. PEBL also used SVMlight. S-EM is our earlier system. It is publicly available at http://www.cs.uic.edu/~liub/S-EM/S-EM-download.html. Table 2 shows the classification results of various techniques in terms of F value and accuracy (Acc) for a = 15% (the positive set is small). The final row of the table gives the average result of each column. We used 10 clusters (i.e., k = 10) for k-means clustering in Roc-Clu-SVM (later we will see that the number of clusters does not matter much). We observe that Roc-Clu-SVM produces better results than Roc-SVM. Both Roc-SVM and Roc-Clu-SVM outperform NB, S-EM and PEBL. PEBL is extremely poor in this case. In fact, PEBL performs poorly when the number of positive documents is small. When the number of positive documents is large, it usually performs well (see Table 3 with a = 45%). Both Roc-SVM and Roc-Clu-SVM perform well consistently. We summarize the average F value results of all a settings in Figure 5. Due to space limitations, we are unable to show the accuracy results (as noted earlier, accuracy does not fully reflect the performance of our system). Table 2: Experiment results for a = 15%. Table 3: Experiment results for a = 45%. Figure 5 Average results for all a settings From Figure 5, we can draw the following conclusions: 1. S-EMís results are quite consistent under different settings. However, its results are worse than Roc-SVM and Roc-Clu-SVM. The reason is that the negative documents extracted from U by its spy technique are not that reliable, and S-EM uses a weaker classifier, NB. 2. PEBLís results are extremely poor when the number of positive documents is small. We believe that this is because its strategy of extracting the initial set of strong negative documents could easily go wrong without sufficient positive data. Even when the number of positive documents is large, it may also go wrong. For example, for a = 55%, one F value (for the dataset, trade) is only 0.073. This shows that PEBL is not robust. 3. Both Roc-SVM and Roc-Clu-SVM are robust with different numbers of positive documents. This is important 15% F Acc F Acc F Acc F Acc F Acc acq 0.744 0.920 0.876 0.954 0.001 0.817 0.846 0.948 0.867 0.952 corn 0.452 0.983 0.452 0.983 0.000 0.982 0.804 0.993 0.822 0.995 crude 0.731 0.979 0.820 0.984 0.000 0.955 0.782 0.983 0.801 0.985 earn 0.910 0.949 0.947 0.968 0.000 0.693 0.858 0.924 0.891 0.959 grain 0.728 0.977 0.807 0.982 0.020 0.955 0.845 0.987 0.869 0.986 interest 0.609 0.972 0.648 0.970 0.000 0.963 0.704 0.976 0.724 0.983 money 0.754 0.974 0.793 0.975 0.000 0.945 0.768 0.973 0.785 0.973 ship 0.701 0.989 0.742 0.990 0.008 0.978 0.578 0.986 0.596 0.989 trade 0.627 0.977 0.698 0.979 0.000 0.962 0.759 0.983 0.778 0.985 wheat 0.579 0.982 0.611 0.979 0.000 0.978 0.834 0.994 0.854 0.997 Avg 0.684 0.970 0.739 0.976 0.003 0.923 0.778 0.975 0.799 0.980 NB S-EM PEBL Roc-SVM Roc-Clu-SVM 45% F Acc F Acc F Acc F Acc F Acc acq 0.802 0.934 0.891 0.959 0.891 0.963 0.905 0.964 0.909 0.965 corn 0.502 0.972 0.517 0.970 0.663 0.991 0.635 0.990 0.645 0.992 crude 0.833 0.985 0.850 0.986 0.798 0.984 0.811 0.985 0.811 0.985 earn 0.924 0.956 0.950 0.970 0.956 0.974 0.886 0.937 0.923 0.956 grain 0.768 0.975 0.772 0.975 0.900 0.992 0.903 0.992 0.903 0.992 interest 0.617 0.959 0.614 0.956 0.770 0.983 0.614 0.957 0.616 0.957 money 0.751 0.969 0.760 0.968 0.714 0.973 0.764 0.969 0.764 0.970 ship 0.791 0.991 0.806 0.991 0.672 0.989 0.829 0.993 0.843 0.994 trade 0.678 0.973 0.693 0.972 0.728 0.982 0.728 0.982 0.728 0.982 wheat 0.595 0.973 0.595 0.972 0.783 0.992 0.779 0.992 0.792 0.994 Avg 0.726 0.969 0.745 0.972 0.788 0.982 0.785 0.976 0.793 0.979 NB S-EM PEBL Roc-SVM Roc-Clu-SVM 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 5% 15% 25% 35% 45% 55% 65% a F value NB S-EM PEBL Roc-SVM Roc-Clu-SVM

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：Learning to Classify Text Using Positive