正在加载图片...
Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China MRR SO1 S@5 P@5 6.2 Promotion Baseline strategies 7628.65509200 We will now turn our attention to the performance of our .67554550.8750 promotion function. The mid-section of Table 4 shows the Promotion strategie results of the promotion function in combination with the 18.660094505080 sum or vote aggregation strategies First, we inspect at the absolute performance of our pro- 788367509400.5420 Improvement of promotion motion method. In terms of success at rank 1 (s@1) we se that for more than 67% of the photos the vote* strategy vote+ ys sun3.3%3.1%2.2%9.9% returns a relevant tag at rank 1. Expanding to the top 5 recommending tags(S@5) we see the performance goes up Table 4: Evaluation results for our four tag recom to 94%. In terms of precision at rank 5, P@, we also ob- mendation strategies using the test collection. The serve that the vote+ strategy achieves a precision of 0.54, improvement of promotion is calculated using our which says that on average 2.7 of the top 5 recommended better performing baseline run(sum)and better tags were accepted as being good descriptors for the pho performing promotion run(vote) If we compare the relative performance between the two gregation strategies, sumt and vote, we observe that the two strategies behave rather similar, except in terms of pre- recommendation strategies. In the next section we use the cision at 5, where the votet strategy outperforms the sun ame parameter settings when we evaluate the system using method. This indicates that there is an interaction effect the test collection between the sum strategy and the votet strategy, showing that the promotion function has a significant positive effect on the effectiveness of the recommendation. As a matter 6. EVALUATION RESULTS of fact, statistical significance tests, based on Manova re- The presentation of the evaluation results is organised in peated measurements with a general linear model show that four sections. First we report the results for the two aggre the sum, sum*, and vote+ strategies all perform significantly gation strategies, and in Section 6.2 we examine the perfor- better than the vote strategy(p 0.05). And likewise for nance of the promotion function. Section 6.3 discusses the the votet strategy, which is significantly performing better results for the different tag classes. Finally, in Section 6.4, than sum, and sum we analyse the type of tags that are recommended and ac In addition, when comparing the relative improvement, as cepted, in comparison to the user-defined tags based on the shown in the bottom section of Table 4 for the best promo- WordNet classification tion strategy(vote+)compared to the sum strategy. We find that for all metrics there is improvement. The improvement 6.1 Aggregation Strategies is marginal for MRR, Sal, and Sa5, and as reported before. In this section we evaluate the performance of the aggre- for the precision at 5, Pa5, the improvement is significant gation strategies sum and vote. The top section of Table 4 (9.9%). We can thus argue that promotion strategy shows the results for the two aggregation methods on the is good at retrieving useful recommendations in the top 5 test collection of the ranking without negatively affecting the performance First, we inspect the absolute performance of the two very early in the ranking. This effect continues if we look be- strategies. Based on the metric success at rank 1(Sa1) yond rank 5. For Po10 we measure that the vote strategy we observe that for more than 65% of the cases our best continues to improve, showing a 10. 1% improvement com- performing aggregation strategy -i. e, sum -returns a good pared to the sum strategy, although the absolute precision descriptive tag at rank 1. For the success at rank 5(S@5), goes down to 0.46 we see that this percentage goes up to 92% For the precision at rank 5(P@5), we measure a precision 6.3 Tag Classes of0. 49 for the sum aggregation strategy, which indicates that In this subsection we look at the performance of our on average for this strategy 50% of the tags recommended tem over different classes of photos, where we classify the are considered useful. We can thus safely argue that the photos, based on the criteria as defined in Section 3.2(Ta- sum aggregation strategy performs very well and would be ble 1). I.e., we look at classes of photos with 1 tag, photos a useful asset for users who want support when annotating with 2-3 tags, 4-6 tags, and more than 6 tags, respectively their photos. Table 5 shows the evaluation results of the sum When looking at the relative difference in performance in comparison to the vote+ strategy. On the sum between the two aggregation strategies, vote and sum, we the performance is not evenly distributed over the 温8 observe that for all metrics the sum strategy outperforms classes. The performance is better when the photo annot- the voting strat This is particularly evident for the very tion is sparse (classes I and II) than for the photos with a early precision(MRR and Sa1) where the voting strategy is richer annotation(classes III and IV). For the vote+ strat clearly inferior. The intuition behind this behaviour is tha egy, we find that the the performance is more evenly dis- he voting strategy does not distinguish between tags tha tributed over the different classes. Which is reflected in the occur at different positions in the ranking of the candidate bottom section of the table where the relative comparison lists. I.e., it considers the top co-occurring tag just as a good of the sum and vote strategies shows a larger improvement candidate as the tenth. To the contrary, the sum strategy for the classes Ill and Iv. We observe that pr takes the co-occurrence values into account and thus treat marginal effect on the photos with only a few user-defined a first co-occurring tag as a better candidate than the tenth tags. However, for the photos with richer annotations the improvement is significant. Hence we conclude that the pro-MRR S@1 S@5 P@5 Baseline strategies sum .7628 .6550 .9200 .4930 vote .6755 .4550 .8750 .4730 Promotion strategies sum+ .7718 .6600 .9450 .5080 vote+ .7883 .6750 .9400 .5420 Improvement of promotion vote+ vs sum 3.3% 3.1% 2.2% 9.9% Table 4: Evaluation results for our four tag recom￾mendation strategies using the test collection. The improvement of promotion is calculated using our better performing baseline run (sum) and better performing promotion run (vote+). recommendation strategies. In the next section we use the same parameter settings when we evaluate the system using the test collection. 6. EVALUATION RESULTS The presentation of the evaluation results is organised in four sections. First we report the results for the two aggre￾gation strategies, and in Section 6.2 we examine the perfor￾mance of the promotion function. Section 6.3 discusses the results for the different tag classes. Finally, in Section 6.4, we analyse the type of tags that are recommended and ac￾cepted, in comparison to the user-defined tags based on the WordNet classification. 6.1 Aggregation Strategies In this section we evaluate the performance of the aggre￾gation strategies sum and vote. The top section of Table 4 shows the results for the two aggregation methods on the test collection. First, we inspect the absolute performance of the two strategies. Based on the metric success at rank 1 (S@1), we observe that for more than 65% of the cases our best performing aggregation strategy – i.e., sum – returns a good descriptive tag at rank 1. For the success at rank 5 (S@5), we see that this percentage goes up to 92%. For the precision at rank 5 (P@5), we measure a precision of 0.49 for the sum aggregation strategy, which indicates that on average for this strategy 50% of the tags recommended are considered useful. We can thus safely argue that the sum aggregation strategy performs very well and would be a useful asset for users who want support when annotating their photos. When looking at the relative difference in performance between the two aggregation strategies, vote and sum, we observe that for all metrics the sum strategy outperforms the voting strategy. This is particularly evident for the very early precision (MRR and S@1) where the voting strategy is clearly inferior. The intuition behind this behaviour is that the voting strategy does not distinguish between tags that occur at different positions in the ranking of the candidate lists. I.e., it considers the top co-occurring tag just as a good candidate as the tenth. To the contrary, the sum strategy takes the co-occurrence values into account and thus treats a first co-occurring tag as a better candidate than the tenth co-occurring tag. 6.2 Promotion We will now turn our attention to the performance of our promotion function. The mid-section of Table 4 shows the results of the promotion function in combination with the sum or vote aggregation strategies. First, we inspect at the absolute performance of our pro￾motion method. In terms of success at rank 1 (S@1) we see that for more than 67% of the photos the vote+ strategy returns a relevant tag at rank 1. Expanding to the top 5 recommending tags (S@5) we see the performance goes up to 94%. In terms of precision at rank 5, P@5, we also ob￾serve that the vote+ strategy achieves a precision of 0.54, which says that on average 2.7 of the top 5 recommended tags were accepted as being good descriptors for the photo. If we compare the relative performance between the two aggregation strategies, sum+ and vote+, we observe that the two strategies behave rather similar, except in terms of pre￾cision at 5, where the vote+ strategy outperforms the sum+ method. This indicates that there is an interaction effect between the sum strategy and the vote+ strategy, showing that the promotion function has a significant positive effect on the effectiveness of the recommendation. As a matter of fact, statistical significance tests, based on Manova re￾peated measurements with a general linear model show that the sum, sum+, and vote+ strategies all perform significantly better than the vote strategy (p < 0.05). And likewise for the vote+ strategy, which is significantly performing better than sum, and sum+. In addition, when comparing the relative improvement, as shown in the bottom section of Table 4 for the best promo￾tion strategy (vote+) compared to the sum strategy. We find that for all metrics there is improvement. The improvement is marginal for MRR, S@1, and S@5, and as reported before, for the precision at 5, P@5, the improvement is significant (9.9%). We can thus argue that our promotion strategy is good at retrieving useful recommendations in the top 5 of the ranking without negatively affecting the performance very early in the ranking. This effect continues if we look be￾yond rank 5. For P@10 we measure that the vote+ strategy continues to improve, showing a 10.1% improvement com￾pared to the sum strategy, although the absolute precision goes down to 0.46. 6.3 Tag Classes In this subsection we look at the performance of our sys￾tem over different classes of photos, where we classify the photos, based on the criteria as defined in Section 3.2 (Ta￾ble 1). I.e., we look at classes of photos with 1 tag, photos with 2–3 tags, 4–6 tags, and more than 6 tags, respectively. Table 5 shows the evaluation results of the sum strategy in comparison to the vote+ strategy. On the sum strategy the performance is not evenly distributed over the different classes. The performance is better when the photo annota￾tion is sparse (classes I and II) than for the photos with a richer annotation (classes III and IV). For the vote+ strat￾egy, we find that the the performance is more evenly dis￾tributed over the different classes. Which is reflected in the bottom section of the table where the relative comparison of the sum and vote+ strategies shows a larger improvement for the classes III and IV. We observe that promotion has a marginal effect on the photos with only a few user-defined tags. However, for the photos with richer annotations the improvement is significant. Hence we conclude that the pro- 333 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有