正在加载图片...
A Conf Prec Rec FMTFIDF TFIDF Thresh Prec Rec FM Median TFIDF TFIDF 0900.6480.0770.1370.060 0.7170.1740.2810.169 0.700.5140.1670.2520.051 00050.6090.2450.3490.140 0.057 0.500.4350.2440.3120.048 0.018 0010.3700.4390.4010.096 0.031 0.300.3570.3190.3370.045 0.016 000291052703750050026 0.100.2650.4080.3210.0440.015 0000010.1680.6690.269 0.0710022 Table 3: Results for tag recommendation using as- Table 5: Results for tag recommendation using LDA sociation rules with different minimum confidences with 100 topics with different thresholds to recom- and 5 known bookmarks mend a tag for 5 known bookmarks #BM Prec Rec FM TFIDF IDF #BM Prec Rec FM. median 0.7410.0410.0770.054 0.030 10.6800.0690.1260.233 0.691 0.1040.057 0.030 20.7170.1120.1930.1990.097 30.682 0.029 0.7120.1390.2330.186 0.630.0720.1300.0600.029 50.6480.0770.1370.0600.029 40.7110.1600.2610.1740.084 50.7170.1740.2810.169 0.079 Table 4: Results for tag recommendation using as- Table 6: Results for tag recommendation using LDA sociation rules with minimum confidence 0.9 for 1-5 with 100 topics and threshold 0.01 for 1-5 known known bookmarks bookmarks 3.2. Association rule probabilities up to which we recommended tags. Table 5 For mining association rules, we have used RapidMiner[24]. shows precision, recall, f-measure(FM), as well as average For the 9000 resources in the training set we get almost 550 TFIDF and median TFIDF of the"correctly"recommended K association rules with a minimum support of 0.05 and a tags. Not surprisingly, precision decreases when lowering minimum confidence of 0. 1, many of which are of course par- he threshold whereas recall increases. We get a maximum tially redundant. Table 3 gives the results for 5 bookmarks, f-measure at 0.001 of 0.401 t different confidence levels(Conf). Precision(Prec, re Table 6 gives detailed results for different numbers of known call(Rec), f-measure(FM) are measured at macro level, bookmarks using a threshold of 0.01 to recommend with i.e., they are averaged over the individual measures for each high precision. Knowing more bookmarks in advance for a esource.The maximum precision(Prec) of 0.648 for con- resource does not increase precision(2 bookmarks-0717 fidence 20.9 is lower than the 0.873 reported in [18], who 5 bookmarks-0.717) but increases recall significantly.The average TFidF gives the expected value for the specificit into 50K training and 10K testing ). Maximum f-measure of a tag whereas the median gives the typical specificity. Be- reached with association rules above the fairly low con- cause the TFiDF values show a power law distribution, the fidence of 0.3. The last two columns give the average and average is of course larger than the median. Both values Median TFIDF for correctly recommended tags. Both val- are significantly higher for tags recommended by LDa than les lie in the same range as the corresponding values for by association rules, but also higher than the average and the actual tags in the testset(0.054 and 0.018), which indi- median TFIDF of the actual tags present in our tag set.As cates that association rules tend to recommend rather g can be seen in Table 4 and table 6 the tfidf values are eral tags. In an attempt to recommend more specific tags, two to four times higher. Recommending resource specific we have also experimented with a smaller support of 0.01. tags with high TFIDF is particularily useful for search as This however only increases recall at the cost of precision the average and median specificity of recommended tags r pointed out in [9], fairly infrequent tags are usually used for topical and type annotations. mains in the same range. For a smaller number of available The results for varying the number of latent topics are bookmarks, precision goes up and recall goes down, and the shown in Table 7. The f-measure is shown for 50. 100. 250 f-measure slightly decreases. Average and median TFIDF and 500 latent topics. The number of bookmarks(# BM)in- in essentially constant(see Table 4) dicates the number of users that have annotated a resource in the test set. The threshold for our recommendation was 3. 2.2 Latent Dirichlet allocation set to 0.001. As can be seen in the table, performance de- The tag recommendation algorithm is implemented in Java. creases with the lda topic size for the 1 BM case. This We used Ling Pipe 2, to perform the Latent Dirichlet Allo- ffect is reversed when adding more bookmarks. A smal cation with Gibbs sampling. The LDa algorithm takes three number of topics typically leads to fairly general topics that input parameters: the number of terms to represent a latent re mixtures of more specific subtopics. Such general topics topic, the number of latent topics to represent a document have a higher chance to be evoked by one of the few tags and the overall number of latent topics to be identified inin one bookmark, leading to a higher recall. With more the given corpus. After some experiments with varying th bookmarks. there are more tags. and it is more beneficial first two parameters we fixed them at a value of 100 to separate the general topics into more specific topics. 100 As described in Section 2.3 we can set a threshold for the LDa topics give the best averaConf Prec Rec FM Avg Median TFIDF TFIDF 0.90 0.648 0.077 0.137 0.060 0.029 0.70 0.514 0.167 0.252 0.051 0.021 0.50 0.435 0.244 0.312 0.048 0.018 0.30 0.357 0.319 0.337 0.045 0.016 0.10 0.265 0.408 0.321 0.044 0.015 Table 3: Results for tag recommendation using as￾sociation rules with different minimum confidences and 5 known bookmarks #BM Prec Rec FM Avg Median TFIDF TFIDF 1 0.741 0.041 0.077 0.054 0.030 2 0.691 0.056 0.104 0.057 0.030 3 0.682 0.066 0.120 0.059 0.029 4 0.663 0.072 0.130 0.060 0.029 5 0.648 0.077 0.137 0.060 0.029 Table 4: Results for tag recommendation using as￾sociation rules with minimum confidence 0.9 for 1–5 known bookmarks 3.2.1 Association Rules For mining association rules, we have used RapidMiner [24]. For the 9000 resources in the training set we get almost 550 K association rules with a minimum support of 0.05 and a minimum confidence of 0.1, many of which are of course par￾tially redundant. Table 3 gives the results for 5 bookmarks, at different confidence levels (Conf). Precision (Prec), re￾call (Rec), f-measure (FM) are measured at macro level, i.e., they are averaged over the individual measures for each resource. The maximum precision (Prec) of 0.648 for con- fidence ≥ 0.9 is lower than the 0.873 reported in [18], who operated on a bigger dataset (about 60K resources, split into 50K training and 10K testing). Maximum f-measure is reached with association rules above the fairly low con- fidence of 0.3. The last two columns give the average and median TFIDF for correctly recommended tags. Both val￾ues lie in the same range as the corresponding values for the actual tags in the testset (0.054 and 0.018), which indi￾cates that association rules tend to recommend rather gen￾eral tags. In an attempt to recommend more specific tags, we have also experimented with a smaller support of 0.01. This however only increases recall at the cost of precision; the average and median specificity of recommended tags re￾mains in the same range. For a smaller number of available bookmarks, precision goes up and recall goes down, and the f-measure slightly decreases. Average and median TFIDF remain essentially constant (see Table 4). 3.2.2 Latent Dirichlet Allocation The tag recommendation algorithm is implemented in Java. We used LingPipe [2], to perform the Latent Dirichlet Allo￾cation with Gibbs sampling. The LDA algorithm takes three input parameters: the number of terms to represent a latent topic, the number of latent topics to represent a document, and the overall number of latent topics to be identified in the given corpus. After some experiments with varying the first two parameters we fixed them at a value of 100. As described in Section 2.3 we can set a threshold for the Thresh Prec Rec FM Avg Median TFIDF TFIDF 0.01 0.717 0.174 0.281 0.169 0.079 0.005 0.609 0.245 0.349 0.140 0.057 0.001 0.370 0.439 0.401 0.096 0.031 0.0005 0.291 0.527 0.375 0.085 0.026 0.00001 0.168 0.669 0.269 0.071 0.022 Table 5: Results for tag recommendation using LDA with 100 topics with different thresholds to recom￾mend a tag for 5 known bookmarks #BM Prec Rec FM Avg Median TFIDF TFIDF 1 0.680 0.069 0.126 0.233 0.128 2 0.717 0.112 0.193 0.199 0.097 3 0.712 0.139 0.233 0.186 0.089 4 0.711 0.160 0.261 0.174 0.084 5 0.717 0.174 0.281 0.169 0.079 Table 6: Results for tag recommendation using LDA with 100 topics and threshold 0.01 for 1–5 known bookmarks probabilities up to which we recommended tags. Table 5 shows precision, recall, f-measure (FM), as well as average TFIDF and median TFIDF of the “correctly” recommended tags. Not surprisingly, precision decreases when lowering the threshold whereas recall increases. We get a maximum f-measure at 0.001 of 0.401 Table 6 gives detailed results for different numbers of known bookmarks using a threshold of 0.01 to recommend with high precision. Knowing more bookmarks in advance for a resource does not increase precision (2 bookmarks → 0.717; 5 bookmarks → 0.717) but increases recall significantly. The average TFIDF gives the expected value for the specificity of a tag whereas the median gives the typical specificity. Be￾cause the TFIDF values show a power law distribution, the average is of course larger than the median. Both values are significantly higher for tags recommended by LDA than by association rules, but also higher than the average and median TFIDF of the actual tags present in our tag set. As can be seen in Table 4 and Table 6 the TFIDF values are two to four times higher. Recommending resource specific tags with high TFIDF is particularily useful for search as pointed out in [9], fairly infrequent tags are usually used for topical and type annotations. The results for varying the number of latent topics are shown in Table 7. The f-measure is shown for 50, 100, 250, and 500 latent topics. The number of bookmarks (#BM) in￾dicates the number of users that have annotated a resource in the test set. The threshold for our recommendation was set to 0.001. As can be seen in the table, performance de￾creases with the LDA topic size for the 1 BM case. This effect is reversed when adding more bookmarks. A small number of topics typically leads to fairly general topics that are mixtures of more specific subtopics. Such general topics have a higher chance to be evoked by one of the few tags in one bookmark, leading to a higher recall. With more bookmarks, there are more tags, and it is more beneficial to separate the general topics into more specific topics. 100 LDA topics give the best average results
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有