正在加载图片...
Www 2008/ Refereed Track: Rich Media April 21-25, 2008. Beijing, China of the tags unclassified, as depicted in the in-set of Figure 3. The coefficient takes the number of intersections between When focussing on the set of classified tags, we find that the two tags, divided by the union of the two tags. The locations are tagged most frequent(28%); followed by arti- Jaccard coefficient is know to be useful to measure the sim- facts or objects(16%), people or groups(13%), actions or ilarity between two objects or sets. In general, we can use events(9%), and time(7%). The category other(27%)con- symmetric measures, like Jaccard, to induce whether two tains the set of tags that is classified by the wordNet broad ags have a similar meaning. categories, but does not belong any of the before mentioned categories. From this information, we can conclude that Asymmetric measures. Alternatively, tag co-occurrence can users do not only tag the visual contents of the photo, but be normalised using the frequency of one of the tags. For o a large extent provide a broader context in which the instance, using the equation photo was taken, such as, location, time, and actions. P(t|t1):= 4. TAG RECOMMENDATION STRATEGIES In this section we provide a detailed description of the ta It captures how often the tag ti co-occurs with tag t,nor- recommendation system. We start with a general overview malised by the total frequency of tag ti. We can interpret of the system architecture, followed by an introduction of this as the probability of a photo being annotated with tag he tag co-occurrence metrics used. Finally, we explain the t, given the it was annotated with tag ti. Several variations tag aggregation and promotion strategies that are used by of asymmetric co-occurrence measure have been proposed in the system and evaluated in the experiment literature before to build tag(or term) hierarchies [20, 17 2] 4.1 Tag Recommendation System To illustrate the difference between symmetric and asym- Figure 4 provides an overview of the tag recommenda- metric co-occurrence measures consider the tag Eiffel Tower. tion process. Given a photo with user-defined tags, an For the symmetric measure we find that the most co-occurring dered list of m candidate tags is derived for each of the tags are(in order ): Tour Eiffel, Eiffel Seine, La Tour Eiffel user-defined tags, based on tag co-occurrence. The lists of and Paris. When using the asymmetric measure the most candidate tags are then used as input for tag aggregation co-occurring tags are (in order ) Paris, france, Tour Eif- and ranking, which ultimately produces the ranked list of n fel, Eiffel and Europe. It shows that the Jaccard symmetric ecommended tags. Consider the example given in Figure 4 coefficient is good at identifying equivalent tags, like Tour there are two tags defined by the user: Sagrada Familia and Eiffel, Eiffel, and La Tour Eiffel, or picking up a close by Barcelona.For both tags, a list of 6 co-occurring tags is landmark such as the Seine. Based on this observation, it is derived. They have some tags in common, such as Spain, more likely that asymmetric tag co-occurrence will provide a Gaudi, and Catalunya, while the other candidate tags only more suitable diversity of candidate tags than its symmetric appear in one. After aggregation and ranking 5 tags are being recommended: Gaudi, Spain, Catalunya, architecture, and church. The actual number of tags being recommended 4.3 Tag aggregation and promotion should of course depend on the relevancy of the tags, and When the lists of candidate tags for each of the user- varies for each different application defined tags are known, a tag aggregation step is needed to merge the lists into a single ranking. In this section, we 4.2 Tag Co-occurrence define two aggregation methods, based on voting and sum- Tag co-occurrence is the key to our tag recommendation ming that serve this purpose. Furthermore, we implemented approach, and only works reliable when a large quantity a re-ranking procedure that promotes candidate tags having f supporting data is available. Obviously, the amount of certain properties ser-generated content that is created by Flickr users, satis- In the this section we refer to three different types of fies this demand and provides the collective knowledge base that is needed to make tag recommendation systems work in to the set of tas practise. In this sub-section we look at various methods to calculate co-occurrence coefficients between of two tags. We Candidate tags Cu is the ranked list with the top m define the co-occurrence between two tags to be the number nost co-occurring tags, for a user-defined taguE U. f photos lin our collection where both tags are used in the We denote C to refer to the union of all candidate tags same annotation for each user-defined tag uEU. Using the raw tag co-occurrence for computing the quality f the relationship between two tags is not very meaningful Recommended tags R is the ranked list of n most s these values do not take the frequency of the individual elevant tags produced by the tag recommendation sys- tags into account. Therefore it is common to normalise the CO-occurrence count with the overall frequency of the tags. For a given set of candidate tags(C)a tag aggregation There are essentially two different normalisation methods step is needed to produce the final list of recommended tags symmetric and asymmetric. R), whenever there is more than one user-defined tag. In this section, we define two aggregation strategies. One strat- Symmetric measures. According to the Jaccard coefficient egy is based on voting, and does not take the co-occurrence re can normalise the co-occurrence of two tags ti and t, by values of the candidate tags into account, while the summing calculating strategy uses the co-occurrence values to produce the final ranking. In both e apply the strategy to the top m J(t,t1;):= t:∩tl cO-occurring tags in the list 330of the tags unclassified, as depicted in the in-set of Figure 3. When focussing on the set of classified tags, we find that locations are tagged most frequent (28%); followed by arti￾facts or objects (16%), people or groups (13%), actions or events (9%), and time (7%). The category other (27%) con￾tains the set of tags that is classified by the WordNet broad categories, but does not belong any of the before mentioned categories. From this information, we can conclude that users do not only tag the visual contents of the photo, but to a large extent provide a broader context in which the photo was taken, such as, location, time, and actions. 4. TAG RECOMMENDATION STRATEGIES In this section we provide a detailed description of the tag recommendation system. We start with a general overview of the system architecture, followed by an introduction of the tag co-occurrence metrics used. Finally, we explain the tag aggregation and promotion strategies that are used by the system and evaluated in the experiment. 4.1 Tag Recommendation System Figure 4 provides an overview of the tag recommenda￾tion process. Given a photo with user-defined tags, an or￾dered list of m candidate tags is derived for each of the user-defined tags, based on tag co-occurrence. The lists of candidate tags are then used as input for tag aggregation and ranking, which ultimately produces the ranked list of n recommended tags. Consider the example given in Figure 4, there are two tags defined by the user: Sagrada Familia and Barcelona. For both tags, a list of 6 co-occurring tags is derived. They have some tags in common, such as Spain, Gaudi, and Catalunya, while the other candidate tags only appear in one. After aggregation and ranking 5 tags are being recommended: Gaudi, Spain, Catalunya, architecture, and church. The actual number of tags being recommended should of course depend on the relevancy of the tags, and varies for each different application. 4.2 Tag Co-occurrence Tag co-occurrence is the key to our tag recommendation approach, and only works reliable when a large quantity of supporting data is available. Obviously, the amount of user-generated content that is created by Flickr users, satis- fies this demand and provides the collective knowledge base that is needed to make tag recommendation systems work in practise. In this sub-section we look at various methods to calculate co-occurrence coefficients between of two tags. We define the co-occurrence between two tags to be the number of photos [in our collection] where both tags are used in the same annotation. Using the raw tag co-occurrence for computing the quality of the relationship between two tags is not very meaningful, as these values do not take the frequency of the individual tags into account. Therefore it is common to normalise the co-occurrence count with the overall frequency of the tags. There are essentially two different normalisation methods: symmetric and asymmetric. Symmetric measures. According to the Jaccard coefficient we can normalise the co-occurrence of two tags ti and tj by calculating: J(ti, tj ) := |ti ∩ tj | |ti ∪ tj | (1) The coefficient takes the number of intersections between the two tags, divided by the union of the two tags. The Jaccard coefficient is know to be useful to measure the sim￾ilarity between two objects or sets. In general, we can use symmetric measures, like Jaccard, to induce whether two tags have a similar meaning. Asymmetric measures. Alternatively, tag co-occurrence can be normalised using the frequency of one of the tags. For instance, using the equation: P(tj |ti) := |ti ∩ tj | |ti| (2) It captures how often the tag ti co-occurs with tag tj nor￾malised by the total frequency of tag ti. We can interpret this as the probability of a photo being annotated with tag tj given the it was annotated with tag ti. Several variations of asymmetric co-occurrence measure have been proposed in literature before to build tag (or term) hierarchies [20, 17, 21]. To illustrate the difference between symmetric and asym￾metric co-occurrence measures consider the tag Eiffel Tower. For the symmetric measure we find that the most co-occurring tags are (in order): Tour Eiffel, Eiffel, Seine, La Tour Eiffel and Paris. When using the asymmetric measure the most co-occurring tags are (in order): Paris, France, Tour Eif￾fel, Eiffel and Europe. It shows that the Jaccard symmetric coefficient is good at identifying equivalent tags, like Tour Eiffel, Eiffel, and La Tour Eiffel, or picking up a close by landmark such as the Seine. Based on this observation, it is more likely that asymmetric tag co-occurrence will provide a more suitable diversity of candidate tags than its symmetric opponent. 4.3 Tag Aggregation and Promotion When the lists of candidate tags for each of the user￾defined tags are known, a tag aggregation step is needed to merge the lists into a single ranking. In this section, we define two aggregation methods, based on voting and sum￾ming that serve this purpose. Furthermore, we implemented a re-ranking procedure that promotes candidate tags having certain properties. In the this section we refer to three different types of tags: • User-defined tags U refers to the set of tags that the user assigned to a photo. • Candidate tags Cu is the ranked list with the top m most co-occurring tags, for a user-defined tag u ∈ U. We denote C to refer to the union of all candidate tags for each user-defined tag u ∈ U. • Recommended tags R is the ranked list of n most relevant tags produced by the tag recommendation sys￾tem. For a given set of candidate tags (C) a tag aggregation step is needed to produce the final list of recommended tags (R), whenever there is more than one user-defined tag. In this section, we define two aggregation strategies. One strat￾egy is based on voting, and does not take the co-occurrence values of the candidate tags into account, while the summing strategy uses the co-occurrence values to produce the final ranking. In both cases, we apply the strategy to the top m co-occurring tags in the list. 330 WWW 2008 / Refereed Track: Rich Media April 21-25, 2008. Beijing, China
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有