正在加载图片...
Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China 1e+06 Unclassifed Location or Object Person or Group Action or Event Time Other 100000 10000 1000 100 00100000 Figure 3: Most frequent WordNet categories for ure 1: Distribution of the Tag Frequency in Flickr tags by a power law[19, 1], and the probability of a tag having ally exhaustively annotated, as there are photos that have tag frequency r is proportional to x-1.15. With respect to provide useful recommendations in such a case. The tail of the tag recommendation task, the head of the power law the power law consists of more than 15 million photos with ontains tags that would be too generic to be useful as only a single tag annotated and 17 million photos having ag suggestion. For example the top 5 most frequent occur- only 2 or 3 tags. Together this already covers 64% of the ring tags are: 2006, 2005, wedding, party, and 2004. The photos. Typically, these are the cases where we expect tag very tail of the power law contains the infrequent tags that recommendation to be useful to extend the annotation of ypically can be categorised as incidentally occurring words he photo uch as mis-spellings, and complex phra To analyse the behaviour of the tag recommendation sys- ambrose tompkins, ambient vector, and more than 15.7 mil- tems for photos with different levels of exhaustiveness of the lion other tags that occur only once in this Flickr snapshot original annotation, we have defined four classes, as shown Due to their infrequent nature, we expect that these highly in Table 1. The classes differentiate from sparsely annotated specific tags will only be useful recommendations in excep- to exhaustively to exhaustively annotated photos, and take the distribution tional cases of the number of tags per photo into account as is shown in the last column of the table. In Section 6. we will use this categorisation to analyse the performance for the different annotation classes Tags per photo Photos Class i ≈15.500.000 Class I 2-3 7.500.000 Class I 4-6 ≈12,000.,000 Class IV >6 ≈7000,000 Table 1: The definition of photo-tag classes and the number of photos in each class 3.3 Tag Categorisation To answer the question " What are users tagg have mapped Flickr tags onto the WordNet broad cate- gores Figure 2: Distribution of the number of tags per egory entries are defined for a term. In that case, the tag is photo in Flickr. bound to the category with the highest ranking. Consider for example the tag London. According to wordNet, London Figure 2 shows the distribution of the number of tags per belongs to two categories: noun location, which refers to photo also follows a power law distribution. The x-axis rep- the city London, and noun person, referring to the novelist resents the 52 million photos, ordered by the number of tags Jack London. In this case the location category is ranked per photo(descending). The y-axis refers to the number higher than the person. Hence, we consider the tag London of tags assigned to the corresponding photo. The proba- to refer to the location bility of having z tags per photo is proportional to x Figure 3 shows the distribution of Flickr tags over the Again, in context of the tag recommendation task, the head nost common WordNet categories. Following this approach of the power law contains photos that are already exception- we can classify 52% of the tags in the collection, leaving 48%1 10 100 1000 10000 100000 1e+06 1 10 100 1000 10000 100000 tag frequency tag Figure 1: Distribution of the Tag Frequency in Flickr. by a power law [19, 1], and the probability of a tag having tag frequency x is proportional to x −1.15. With respect to the tag recommendation task, the head of the power law contains tags that would be too generic to be useful as a tag suggestion. For example the top 5 most frequent occur￾ring tags are: 2006, 2005, wedding, party, and 2004. The very tail of the power law contains the infrequent tags that typically can be categorised as incidentally occurring words, such as mis-spellings, and complex phrases. For example: ambrose tompkins, ambient vector, and more than 15.7 mil￾lion other tags that occur only once in this Flickr snapshot. Due to their infrequent nature, we expect that these highly specific tags will only be useful recommendations in excep￾tional cases. 1 10 100 1000 1 10 100 1000 10000 100000 1e+06 number of tags photo Figure 2: Distribution of the number of tags per photo in Flickr. Figure 2 shows the distribution of the number of tags per photo also follows a power law distribution. The x-axis rep￾resents the 52 million photos, ordered by the number of tags per photo (descending). The y-axis refers to the number of tags assigned to the corresponding photo. The proba￾bility of having x tags per photo is proportional to x −0.33 . Again, in context of the tag recommendation task, the head of the power law contains photos that are already exception- 28% 16% 13% 9% 7% 27% Unclassified Location Artefact or Object Person or Group Action or Event Time Other 48% Figure 3: Most frequent WordNet categories for Flickr tags. ally exhaustively annotated, as there are photos that have more than 50 tags defined. Obviously, it will be hard to provide useful recommendations in such a case. The tail of the power law consists of more than 15 million photos with only a single tag annotated and 17 million photos having only 2 or 3 tags. Together this already covers 64% of the photos. Typically, these are the cases where we expect tag recommendation to be useful to extend the annotation of the photo. To analyse the behaviour of the tag recommendation sys￾tems for photos with different levels of exhaustiveness of the original annotation, we have defined four classes, as shown in Table 1. The classes differentiate from sparsely annotated to exhaustively annotated photos, and take the distribution of the number of tags per photo into account as is shown in the last column of the table. In Section 6, we will use this categorisation to analyse the performance for the different annotation classes. Tags per photo Photos Class I 1 ≈ 15,500,000 Class II 2 – 3 ≈ 17,500,000 Class III 4 – 6 ≈ 12,000,000 Class IV > 6 ≈ 7,000,000 Table 1: The definition of photo-tag classes and the number of photos in each class. 3.3 Tag Categorisation To answer the question “What are users tagging?”, we have mapped Flickr tags onto the WordNet broad cate￾gories [10]. In a number of cases, multiple WordNet cat￾egory entries are defined for a term. In that case, the tag is bound to the category with the highest ranking. Consider for example the tag London. According to WordNet, London belongs to two categories: noun.location, which refers to the city London, and noun.person, referring to the novelist Jack London. In this case the location category is ranked higher than the person. Hence, we consider the tag London to refer to the location. Figure 3 shows the distribution of Flickr tags over the most common WordNet categories. Following this approach, we can classify 52% of the tags in the collection, leaving 48% 329 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有