正在加载图片...
breakdown of the quality of the returned lists for any given PO1 MRR number of input tags, thereby analyzing the whole tagging process, and not just a snapshot. This should be contrasted of tags local global hybrid old www with the approach used in 9, which used all the given tags 33|.10,,13 of. put and then used a user study to evaluate the quality 郭8=154.5838485664414922 the suggestions 16-∞59,65586971,79536132,.3J For our experiments, we bucketed the pictures into three categories. One for pictures with 4-7 tags, one with 8-15 4-751,5825,345059|37,440913 8-162.703:4962,m556416,.20 relevant tags, there are more possible tags to be found for 16-∞73,7958.6078,8470,7|28,34 pictures with more tags, and so the performance for thes .37.50,.60 46.09,12 pictures is higher (see Table 3) 8-156.76145.56m0756120.25 7.3 Choice of Input Tags =∞.75,82|62-7276.8469,731,:3 For our basic setup we chose half the given tags(rounde Table 1: Comparison of several methods for 1, 000 up) as input to whichever system we evaluated. These tag pictures in each set. Since no limit was set for the were obtained in an alternating manner, meaning, that the coverage of tags, the hybrid method only slightly first, third, fifth and so on tag would be used as input, improves over the purely local method for medium hereas the second, fourth etc. tag were used as hold-out and big users. See Table 3 for data to evaluate the performance first three methods for different levels of coverage. For the Input Cost setting(Section 5.2), we started with no tag given as input. Here, the system simply ranked tags according to their past usage frequencies. Then, we followe 8. SUMMARY OF RESULTS the " flow"of either typed or clicked tags. This way, input ags were added one after the other, and the exact sequence Our"Hybrid" scheme out performs both our own previ- ous system ("Old")and another recently presented scheme depended on the previous output of the syster (WWW"). See Table 1. For cases of low coverage, "Hybrid" improves dramatically over the simple "Local"scheme. See Table 3 and Figure 2. For a simple model of measuring the cost of inputting tags, our scheme improves the average cost of tagging by at least 16% in a conservative setting and by 0.6 at least 32% in more realistic setting. See Table 2 9. FUTURE WORK In this work, we implictly defined a tag as"good", if it would be selected by a user. when recommended to her. In the future, we plan to investigate other definitions of"good 0.3 ness"related to either (i)the usefulness to other users(e.g mal users: hybrid--1 using query logs for image searches)or(i)the navigability of the user's collection (e.g, using entropy measures to avoid 0.2 medium users: hybrid -. that the user tags all her pictures in a very similar fashion big users local big users hybrid 10. REFERENCES 1 M. Ames and M. Naaman. Why we tag: motivation Maximum Coverage for annotation in mobile and online media. In The conference on Human factors in computing systems Figure 2: A detailed analysis of when our Hybrid (CHI7), pages971-980,2007 scheme(solid red line) has the biggest performance 2 P. A. Chirita, S Costache, W. Nejd advantage compared to the simple local scheme(dot- S. Handschuh. P-tag: large scale automatic generation ted blue line). The P@l (y-axis) results shown are of personalized annotation tags for the In The for pictures with 8-15 tags coming from differ- 6th international conference on World Wide web ent kinds of users(square ="small "users, circle WWWo7), pages845-854,2007 “ medium” users, triangle=“big” users). To restrict 3 N. Garg and I. Weber. Personalized tag suggestion for the evaluation to a certain maximum coverage (t- axis), all other pictures in the user's profile with a flickr. In The 17th international conference on World Wide Web(www08), pages 1063-1064, 2008 han the threshold were artificially ignored by the system. For low coverage, which cor- 4 T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, 2001 responds to the situation of both new users and a tag topic for old users, the improvement in P@1 5 J. Hipp, U. Guntzer, and G. Nakhaeizadeh between the local and the hy brid scheme is between Algorithms for association rule mining. SIGKDD 100% and 200%. This can be seen by comparing, for Explorations Newsletter, 2(1): 58-64, 2000 fixed symbol, the two different colors(or line types) [6 G. Koutrika, F. A. Effendi, Z. Gyongyi, d H. Garcia-Molina Combatibreakdown of the quality of the returned lists for any given number of input tags, thereby analyzing the whole tagging process, and not just a snapshot. This should be contrasted with the approach used in [9], which used all the given tags as input and then used a user study to evaluate the quality of the suggestions. For our experiments, we bucketed the pictures into three categories. One for pictures with 4 − 7 tags, one with 8 − 15 tags and one with 16−∞ tags. As we only use given tags as relevant tags, there are more possible tags to be found for pictures with more tags, and so the performance for these pictures is higher (see Table 3). 7.3 Choice of Input Tags For our basic setup we chose half the given tags (rounded up) as input to whichever system we evaluated. These tags were obtained in an alternating manner, meaning, that the first, third, fifth and so on tag would be used as input, whereas the second, fourth etc. tag were used as hold-out data to evaluate the performance. For the Input Cost setting (Section 5.2), we started with no tag given as input. Here, the system simply ranked tags according to their past usage frequencies. Then, we followed the “flow” of either typed or clicked tags. This way, input tags were added one after the other, and the exact sequence depended on the previous output of the system. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 1 2 3 4 5 6 7 8 P@1 Maximum Coverage small users:local small users:hybrid medium users:local medium users:hybrid big users:local big users:hybrid Figure 2: A detailed analysis of when our Hybrid scheme (solid red line) has the biggest performance advantage compared to the simple local scheme (dot￾ted blue line). The P@1 (y-axis) results shown are for pictures with 8 − 15 tags coming from differ￾ent kinds of users (square = “small” users, circle = “medium” users, triangle = “big” users). To restrict the evaluation to a certain maximum coverage (x￾axis), all other pictures in the user’s profile with a higher coverage than the threshold were artificially ignored by the system. For low coverage, which cor￾responds to the situation of both new users and a new tag topic for old users, the improvement in P@1 between the local and the hybrid scheme is between 100% and 200%. This can be seen by comparing, for a fixed symbol, the two different colors (or line types). Number P@1 , MRR of tags local global hybrid old www Small 4 − 7 .33, .39 .20, .27 .37, .44 .26, .33 .10, .13 8 − 15 .49, .55 .38, .48 .56, .64 .41, .49 .21, .26 16 − ∞ .59, .65 .58, .69 .71, .79 .53, .61 .32, .39 Medium 4 − 7 .51, .58 .25, .34 .50, .59 .37, .44 .09, .13 8 − 15 .62, .70 .38, .49 .62, .71 .55, .64 .16, .20 16 − ∞ .73, .79 .58, .69 .78, .84 .70, .77 .28, .34 Big 4 − 7 .50, .60 .28, .37 .50, .60 .37, .46 .09, .12 8 − 15 .68, .76 .45, .56 .70, .77 .59, .67 .20, .25 16 − ∞ .75, .82 .62, .72 .76, .84 .69, .77 .31, .38 Table 1: Comparison of several methods for 1, 000 pictures in each set. Since no limit was set for the coverage of tags, the hybrid method only slightly improves over the purely local method for medium and big users. See Table 3 for a breakdown of the first three methods for different levels of coverage. 8. SUMMARY OF RESULTS Our “Hybrid” scheme outperforms both our own previ￾ous system (“Old”) and another recently presented scheme (“WWW”). See Table 1. For cases of low coverage, “Hybrid” improves dramatically over the simple “Local” scheme. See Table 3 and Figure 2. For a simple model of measuring the cost of inputting tags, our scheme improves the average cost of tagging by at least 16% in a conservative setting and by at least 32% in more realistic setting. See Table 2. 9. FUTURE WORK In this work, we implictly defined a tag as “good”, if it would be selected by a user, when recommended to her. In the future, we plan to investigate other definitions of “good￾ness” related to either (i) the usefulness to other users (e.g., using query logs for image searches) or (ii) the navigability of the user’s collection (e.g., using entropy measures to avoid that the user tags all her pictures in a very similar fashion). 10. REFERENCES [1] M. Ames and M. Naaman. Why we tag: motivations for annotation in mobile and online media. In The conference on Human factors in computing systems (CHI’07), pages 971–980, 2007. [2] P. A. Chirita, S. Costache, W. Nejdl, and S. Handschuh. P-tag: large scale automatic generation of personalized annotation tags for the web. In The 16th international conference on World Wide Web (WWW’07), pages 845–854, 2007. [3] N. Garg and I. Weber. Personalized tag suggestion for flickr. In The 17th international conference on World Wide Web (WWW’08), pages 1063–1064, 2008. [4] T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer, 2001. [5] J. Hipp, U. Guntzer, and G. Nakhaeizadeh. ¨ Algorithms for association rule mining. SIGKDD Explorations Newsletter, 2(1):58–64, 2000. [6] G. Koutrika, F. A. Effendi, Z. Gy¨ongyi, P. Heymann, and H. Garcia-Molina. Combating spam in tagging systems. In The 3rd international workshop on 73
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有