Improving Tag-Based Recommendation by Topic Diversification Christian Wartena and martin Wibbels Noway, Brouwerijstraat 1, 7523 XC Enschede, The Netherlands Christian Wartena, Martin. Wibbels lenovay nl Abstract. Collaborative tagging has emerged as a mechanism to de- scribe items in large on-line collections. Tags are assigned by users to describe and find back items, but it is also tempting to describe the users in terms of the tags they assign or in terms of the tags of the items they are interested in. The tag-based profile thus obtained can be used to recommend new items If we recommend new items by computing their similarity to the user profile or to all items seen by the user, we run into the risk of recom- mending only neutral items that are a bit relevant for each topic a user is interested in. In order to increase user satisfaction many recommender systems not only optimize for accuracy but also for diversity. Often it assumed that there exists a trade-off between accuracy and diversity In this paper we introduce topic aware recommendation algorithms. Topic aware algorithms first detect different interests in the user profile and then generate recommendations for each of these interests. We study opic aware variants of three tag based recommendation algorithms and show that each of them gives better recommendations than their base variants, both in terms of precision and recall and in terms of diversity. 1 Introduction Collaborative tagging has emerged in the past decade as a mechanism to describe items in large collections available on-line. Tags are assigned by users to describe and find back previously viewed items. Thus a ternary relation between users tags and items is established. In this paper we will investigate some possibilities to construct a user profile based on tags, to identify distinct interests in this profile, and to recommend items relevant to those interests When we use a tag based user profile an item is recommended if it is relevant to II tags in the user profile. Similarly, if we use a collaborative filtering approach, we require the item to be similar to all items in the user profile. However, for a user who has some distinct interests, an item that fits the average of all his interests might be less accurate than an item that fits exactly one of his interests Thus we expect, at least in some cases, that recommendations will improve if we identify different interests in the user profile and take these into account for the recommendation of new items. If a recommendation strategy does so, re will call this strategy topic aware. In the following we will make a number P. Clough et al.(Eds ) ECIR 2011, LNCS 6611, pp. 43-54, 2011
Improving Tag-Based Recommendation by Topic Diversification Christian Wartena and Martin Wibbels Novay, Brouwerijstraat 1, 7523 XC Enschede, The Netherlands {Christian.Wartena,Martin.Wibbels}@novay.nl Abstract. Collaborative tagging has emerged as a mechanism to describe items in large on-line collections. Tags are assigned by users to describe and find back items, but it is also tempting to describe the users in terms of the tags they assign or in terms of the tags of the items they are interested in. The tag-based profile thus obtained can be used to recommend new items. If we recommend new items by computing their similarity to the user profile or to all items seen by the user, we run into the risk of recommending only neutral items that are a bit relevant for each topic a user is interested in. In order to increase user satisfaction many recommender systems not only optimize for accuracy but also for diversity. Often it is assumed that there exists a trade-off between accuracy and diversity. In this paper we introduce topic aware recommendation algorithms. Topic aware algorithms first detect different interests in the user profile and then generate recommendations for each of these interests. We study topic aware variants of three tag based recommendation algorithms and show that each of them gives better recommendations than their base variants, both in terms of precision and recall and in terms of diversity. 1 Introduction Collaborative tagging has emerged in the past decade as a mechanism to describe items in large collections available on-line. Tags are assigned by users to describe and find back previously viewed items. Thus a ternary relation between users, tags and items is established. In this paper we will investigate some possibilities to construct a user profile based on tags, to identify distinct interests in this profile, and to recommend items relevant to those interests. When we use a tag based user profile an item is recommended if it is relevant to all tags in the user profile. Similarly, if we use a collaborative filtering approach, we require the item to be similar to all items in the user profile. However, for a user who has some distinct interests, an item that fits the average of all his interests might be less accurate than an item that fits exactly one of his interests. Thus we expect, at least in some cases, that recommendations will improve if we identify different interests in the user profile and take these into account for the recommendation of new items. If a recommendation strategy does so, we will call this strategy topic aware. In the following we will make a number P. Clough et al. (Eds.): ECIR 2011, LNCS 6611, pp. 43–54, 2011. c Springer-Verlag Berlin Heidelberg 2011
C. Wartena and m. wibbels of different algorithms for top-n recommendation topic aware by clustering the tags or items in the profile and generating separate recommendation lists for each topic cluster. The final list of recommendations is obtained by merging the topic specific lists. We do not consider rating prediction. Lists of recommended items are more interesting to a user if they are more di verse. Thus diversity is regarded as a desirable property of recommendations. In most studies on topic diversification, the diversity is increased by reordering the elements in an initial list of recommended items. If a list is constructed aiming at optimal results for precision and recall, the reordering usually causes a decrease of performance on these evaluation measures. Thus a trade-off between accuracy and diversity emerges In our approach, however, adding diversity improves pre cision and recall. We do not re-rank results that are already optimal for precision and recall, but the final diversity of the recommendation is a core property of the recommendation strategy. Thus our method is fundamentally different from the re-ranking approaches: we first distinguish different topics and then generate a list of relevant items The remainder of this paper is organized as follows. In the next section we discuss related work. In section 3 we introduce three different tag based recom- mendation algorithms In section 4 we discuss topic detection for tagging system and define topic aware versions of the tag based algorithms In section 5 we port on an evaluation of the proposed algorithms with data from Library Thing, a bookmarking service for book 2 Related work Most work on recommendation and tagging is about recommend ing tags for item prediction has received less attention. Basically, we find two approaches for tag-based item recommendation. The first approach uses tags t compute item-item or user-user similarities that then are used in classical user or item based nearest neighbor recommendation algorithms. The second approach characterizes both users and items by tag vectors, making it possible to compute the similarity between users and items. The items that are most similar to a user now are recommended One of the first papers that integrates tag-based similarities in a nearest neigh bors recommender is [1, who extend the user-item matrix with user-tag simi- larities in order to compute user-user similarities, and extend it with tag-item relations in order to compute item-item similarities. Both similarities are used to compute recommendations. This approach was refined by [2 taking also into account whether users used the tags for the same or for different items. Said et al. 3 use probabilistic latent semantic analysis to model both the user-item matrix and the item-tag matrix. By sharing the latent variables between the modelsthey are able to use the tagging information in the user-item model. Bogers and Van den Bosch [4 use the tag based similarities instead of the clas- sical similarities based on the user-item matrix. They show improvements for item-based nearest neighbor recommendation on various datasets, but did not compare their method to approaches combining both types of similarity
44 C. Wartena and M. Wibbels of different algorithms for top-n recommendation topic aware by clustering the tags or items in the profile and generating separate recommendation lists for each topic cluster. The final list of recommendations is obtained by merging the topic specific lists. We do not consider rating prediction. Lists of recommended items are more interesting to a user if they are more diverse. Thus diversity is regarded as a desirable property of recommendations. In most studies on topic diversification, the diversity is increased by reordering the elements in an initial list of recommended items. If a list is constructed aiming at optimal results for precision and recall, the reordering usually causes a decrease of performance on these evaluation measures. Thus a trade-off between accuracy and diversity emerges. In our approach, however, adding diversity improves precision and recall. We do not re-rank results that are already optimal for precision and recall, but the final diversity of the recommendation is a core property of the recommendation strategy. Thus our method is fundamentally different from the re-ranking approaches: we first distinguish different topics and then generate a list of relevant items. The remainder of this paper is organized as follows. In the next section we discuss related work. In section 3 we introduce three different tag based recommendation algorithms. In section 4 we discuss topic detection for tagging systems and define topic aware versions of the tag based algorithms. In section 5 we report on an evaluation of the proposed algorithms with data from LibraryThing, a bookmarking service for books. 2 Related Work Most work on recommendation and tagging is about recommending tags. Using tags for item prediction has received less attention. Basically, we find two approaches for tag-based item recommendation. The first approach uses tags to compute item-item or user-user similarities that then are used in classical user or item based nearest neighbor recommendation algorithms. The second approach characterizes both users and items by tag vectors, making it possible to compute the similarity between users and items. The items that are most similar to a user now are recommended. One of the first papers that integrates tag-based similarities in a nearest neighbors recommender is [1], who extend the user-item matrix with user-tag similarities in order to compute user-user similarities, and extend it with tag-item relations in order to compute item-item similarities. Both similarities are used to compute recommendations. This approach was refined by [2] taking also into account whether users used the tags for the same or for different items. Said et al. [3] use probabilistic latent semantic analysis to model both the user-item matrix and the item-tag matrix. By sharing the latent variables between the models they are able to use the tagging information in the user-item model. Bogers and Van den Bosch [4] use the tag based similarities instead of the classical similarities based on the user-item matrix. They show improvements for item-based nearest neighbor recommendation on various datasets, but did not compare their method to approaches combining both types of similarity
Improving Tag-Based Recommendation by Topic Diversification The second approach, recommendation based on the distance between an item and the tag based user profile, is e. g. followed by Firan et al. 5. The focus of their work is to determine the optimal set of tags to represent a user. The obvious idea is to represent a user by the tags that he has assigned. However, the resulting tag vector is usually too sparse to compute useful user-item similarities. Some users employ only a very small set of tags, and even for more actively tagging users it might be well the case that a relevant item is tagged only with synonyms of a tag employed by the user. Thus 5 investigate various methods to condense the user profile. The most effective method is to use the tags of all items the user has bookmarked. The same observation was also made by [6. The problem of the sparse user profiles was also identified by [7. They solve this problem not by condensing the user profile, but by taking co-occurring tags into account in the computation of similarities. For each user and each tag t a user specific tag weight is computed. The weight for t is determined by the weight of the most similar tag t' in the user profile and the similarity between t and t', where the inter tag similarity is determined by a variant of the Jaccard-coefficient. The relevance of an item finally is the sum of weights of its tags In 8 the two approaches sketched above are combined: a nearest neighbor gorithm is used to find an initial set of items and subsequently user-item simi- larities are computed for the preselected items to obtain a final recommendation The more interesting aspect of their approach is, that they replace each tag t in the user profile with the tags co-occurring with t on items tagged by the user and restrict the set of tags to the (globally) most popular tags. This results in roughly the same profiles as would be obtained by using the(most popular) tags of all items bookmarked (or tagged) by the user, as we have seen in other approaches. The problem that many recommender systems tend to recommend a set of rery similar items in order to optimize accuracy was noted by 9 and coined the portfolio effect. Reordering of recommended elements to alleviate this problem was proposed by [10. As discussed above, the reordering increases diversity, but at the cost of accuracy. An approach that is very similar to ours is proposed by Zhang and Hurley [11]. They cluster the set of items of a user and apply a item based nearest neighbor recommender to each of the clusters. Finally they merge the results of the sub-recommendation to obtain a list of recommended popular items is recommended very often at the cost of novel or less common items. The main difference with our recommendation strategy is that all item similarities are based on the user-item matrix, whereas we base similarities on descriptive meta data, especially tags. Zhang and Hurley can improve diversity of the recommendation lists in some cases while the influence of the partitioning on the precision is not very large Gemmell et al. [12 propose to cluster tags in a user profile to improve person lized search. They show that clustering improves the search results. Gemmell et al did not consider recommendation, but our approach for recommendation is both in spirit and results similar to their work. An interesting solution targeting
Improving Tag-Based Recommendation by Topic Diversification 45 The second approach, recommendation based on the distance between an item and the tag based user profile, is e.g. followed by Firan et al. [5]. The focus of their work is to determine the optimal set of tags to represent a user. The obvious idea is to represent a user by the tags that he has assigned. However, the resulting tag vector is usually too sparse to compute useful user-item similarities. Some users employ only a very small set of tags, and even for more actively tagging users it might be well the case that a relevant item is tagged only with synonyms of a tag employed by the user. Thus [5] investigate various methods to condense the user profile. The most effective method is to use the tags of all items the user has bookmarked. The same observation was also made by [6]. The problem of the sparse user profiles was also identified by [7]. They solve this problem not by condensing the user profile, but by taking co-occurring tags into account in the computation of similarities. For each user and each tag t a user specific tag weight is computed. The weight for t is determined by the weight of the most similar tag t in the user profile and the similarity between t and t , where the inter tag similarity is determined by a variant of the Jaccard-coefficient. The relevance of an item finally is the sum of weights of its tags. In [8] the two approaches sketched above are combined: a nearest neighbor algorithm is used to find an initial set of items and subsequently user-item similarities are computed for the preselected items to obtain a final recommendation. The more interesting aspect of their approach is, that they replace each tag t in the user profile with the tags co-occurring with t on items tagged by the user, and restrict the set of tags to the (globally) most popular tags. This results in roughly the same profiles as would be obtained by using the (most popular) tags of all items bookmarked (or tagged) by the user, as we have seen in other approaches. The problem that many recommender systems tend to recommend a set of very similar items in order to optimize accuracy was noted by [9] and coined the portfolio effect. Reordering of recommended elements to alleviate this problem was proposed by [10]. As discussed above, the reordering increases diversity, but at the cost of accuracy. An approach that is very similar to ours is proposed by Zhang and Hurley [11]. They cluster the set of items of a user and apply a item based nearest neighbor recommender to each of the clusters. Finally they merge the results of the sub-recommendation to obtain a list of recommended items for the user. The focus of their work is to avoid that a small set of very popular items is recommended very often at the cost of novel or less common items. The main difference with our recommendation strategy is that all item similarities are based on the user-item matrix, whereas we base similarities on descriptive meta data, especially tags. Zhang and Hurley can improve diversity of the recommendation lists in some cases while the influence of the partitioning on the precision is not very large. Gemmell et al. [12] propose to cluster tags in a user profile to improve personalized search. They show that clustering improves the search results. Gemmell et al. did not consider recommendation, but our approach for recommendation is both in spirit and results similar to their work. An interesting solution targeting
46 C. Wartena and m. wibbels at recommendation is proposed by [6 who use non-negative matrix factoriza tion to obtain a weak clustering of tags into topics. Now recommendations are computed by using the probability that a user is interested in a topic and the probability that an item is relevant to that topic. This idea comes very close to the approach for topic diversification that we propose below. However, [ 6use global tag clusters, whereas we apply clustering to the tags of each user profile Moreover, we do not sum up the probabilities for an item over all clusters. 3 Tag based recommendation In the following we will use two basic strategies for tag based recommendation We will show that both strategies can be modified easily to become topic aware and that this leads to improvement of recommendation results in all cases. The first type of algorithm we will consider is item based collaborative filtering. The directly. Therefore we call these algorithms profile based tem and a user profile In all cases the recommendations are based on the tags assigned to the item To be more precise, we will represent each item by a probability distribution over tags. Consider collections of items C=i1,.ik, tags T=ti,.ti) and users= ul,.um Let n(i, t, u) be the number of times a user u assigned tag t to item i, usually 0 or 1. To consider the tags assigned to an item and the ags assigned by a user, respectively, we let n(i,t)=∑n(t,)and (1) n(u,t)=∑n(i,t,u) Furthermore. let nr()=∑n(, ()=∑n(t)an Now we define probability distributions pr(t)and pc(ilz) on respectively the et of tags T and the corpus C that describe how tag occurrences of a given i are distributed over different tags, respectively how the occurrences of a tag z are distributed over different items: Pr(ti)=n(i, t)/nc(i) pc(it)=n(i, t)/nr(t) Finally, for each u E U let Cu be the set of items seen/ bookmarked by u. Note that in many cases a user did not tag all the items he has bookmarked
46 C. Wartena and M. Wibbels at recommendation is proposed by [6] who use non-negative matrix factorization to obtain a weak clustering of tags into topics. Now recommendations are computed by using the probability that a user is interested in a topic and the probability that an item is relevant to that topic. This idea comes very close to the approach for topic diversification that we propose below. However, [6] use global tag clusters, whereas we apply clustering to the tags of each user profile. Moreover, we do not sum up the probabilities for an item over all clusters. 3 Tag Based Recommendation In the following we will use two basic strategies for tag based recommendation. We will show that both strategies can be modified easily to become topic aware and that this leads to improvement of recommendation results in all cases. The first type of algorithm we will consider is item based collaborative filtering. The second type of algorithms uses the similarity between an item and a user profile directly. Therefore we call these algorithms profile based. In all cases the recommendations are based on the tags assigned to the items. To be more precise, we will represent each item by a probability distribution over tags. Consider collections of items C = {i1,...ik}, tags T = {t1,...tl} and users U = {u1,...um} Let n(i, t, u) be the number of times a user u assigned tag t to item i, usually 0 or 1. To consider the tags assigned to an item and the tags assigned by a user, respectively, we let n(i, t) = u∈U n(i, t, u) and (1) n(u, t) = i∈C n(i, t, u). (2) Furthermore, let nT (t) = i∈C n(i, t), (3) nC(i) = t∈T n(i, t) and (4) nU (u) = t∈T n(u, t). (5) Now we define probability distributions pT (t|i) and pC(i|z) on respectively the set of tags T and the corpus C that describe how tag occurrences of a given item i are distributed over different tags, respectively how the occurrences of a given tag z are distributed over different items: pT (t|i) = n(i, t)/nC(i), (6) pC(i|t) = n(i, t)/nT (t). (7) Finally, for each u ∈ U let Cu be the set of items seen/bookmarked by u. Note that in many cases a user did not tag all the items he has bookmarked.
Improving Tag-Based Recommendation by Topic Diversification 3.1 Item Based Collaborative Filtering The first strategy for recommendation we use is a nearest neighbor approach (13 ). Given a distance measure between items, we express the relevance of an item for a given user as the average distance between the item and all items bookmarked (or seen) by the user. Formally, we define the distance of an item i∈ C to a user u∈Uas where d(i, j)is the distance between items i and j. The items with the smallest distance are recommended to the user Given our perspective of tag distributions it is natural to use divergences for e distance between items. We will base our distance on the ensen Shannon divergence, which can be considered as a symmetrical variant of the Kullback Leibler divergence or relative entropy. Since the square root of the Jensen Shan- non divergence is a proper metric satisfying the usual axioms of non-negativity, identity of indiscernibles and triangle inequality(14), we will use d(i,j)=VJSD (PT(ti), Pr(+)) where JSD(p, q) is the Jensen Shannon divergence or information radius between probability distributions p and g. 3.2 Profile based recommendation In the nearest neighbor approach we compute the average distance of an item to the items seen by the user. Alternatively, we can compute the distance of an item to the user that also can be represented by a distribution over tags. For each user we compute the characteristic tag distribution p(tu). The obvious way to define this distribution is: Pr(tu)=n(u, t)/nu(u) (10) This allows us to define a distance between an item i and a user u by setting d(u,) D(pT(tu),pr(t)) and to recommend items with a small distance to the user. However, it was already shown by 5 that this strategy does not perform very well In our experiments the results were even worse than those btained by the simple non-personalized baseline of recommending the most popular items. The main reason for this bad performance is that the distribution is too sparse and thus many relevant items will be tagged with synonyms of the tags in the user profile, but not with exactly those tags In the following we will present two possibilities to alleviate this problem Profiles based on item tags. Firan et al.(5) propose several methods to construct better user profiles. One of the most successful ones is to use the tags of the items considered by the user. We define the item based user profile as pr(tu) PT
Improving Tag-Based Recommendation by Topic Diversification 47 3.1 Item Based Collaborative Filtering The first strategy for recommendation we use is a nearest neighbor approach ([13]). Given a distance measure between items, we express the relevance of an item for a given user as the average distance between the item and all items bookmarked (or seen) by the user. Formally, we define the distance of an item i ∈ C to a user u ∈ U as d(u, i) = j∈Cu d(j, i) |Cu| (8) where d(i, j) is the distance between items i and j. The items with the smallest distance are recommended to the user. Given our perspective of tag distributions it is natural to use divergences for the distance between items. We will base our distance on the Jensen Shannon divergence, which can be considered as a symmetrical variant of the Kullback Leibler divergence or relative entropy. Since the square root of the Jensen Shannon divergence is a proper metric satisfying the usual axioms of non-negativity, identity of indiscernibles and triangle inequality ([14]), we will use d(i, j) = JSD (pT (t|i), pT (t|j)) (9) where JSD(p, q) is the Jensen Shannon divergence or information radius between probability distributions p and q. 3.2 Profile Based Recommendation In the nearest neighbor approach we compute the average distance of an item to the items seen by the user. Alternatively, we can compute the distance of an item to the user that also can be represented by a distribution over tags. For each user we compute the characteristic tag distribution p(t|u). The obvious way to define this distribution is: pT (t|u) = n(u, t)/nU (u). (10) This allows us to define a distance between an item i and a user u by setting d(u, i) = JSD (pT (t|u), pT (t|i)) and to recommend items with a small distance to the user. However, it was already shown by [5] that this strategy does not perform very well. In our experiments the results were even worse than those obtained by the simple non-personalized baseline of recommending the most popular items. The main reason for this bad performance is that the distribution is too sparse and thus many relevant items will be tagged with synonyms of the tags in the user profile, but not with exactly those tags. In the following we will present two possibilities to alleviate this problem. Profiles based on item tags. Firan et al. ([5]) propose several methods to construct better user profiles. One of the most successful ones is to use the tags of the items considered by the user. We define the item based user profile as p T (t|u) = 1 |Cu| i∈Cu pT (t|i). (11)
48 C. Wartena and m. wibbels Profiles based on co-occurring tags. In [15 we have proposed to condense the user profile by adding co-occurring tags. This is achieved by propagating tag probabilities in a Markov chain on T UC having transitions C-T with transition probabilities pr(t i) and transitions T-C with transition probabilities pc(it) The characteristic tag distribution for a user now is defined as pr(tlu)=>PT(tli)pc(ile)pr(tlu) (12) 4 Topic Aware Recommendation In the three basic algorithms discussed above the relevance of an item for a user is predicted by its similarity to all items considered by the user or by its similarity to all tags in the user profile. As discussed above this might result in uninteresting lists of similar and unspecific items. In order to recommend items more specific for one of the interests a user might have, we propose to cluster the items or the tags in his profile. Now we can generate lists of recommended items for each of the clusters and merge them to obtain a final recommendation. Thus e can construe topic aware variants of all three algorithms discussed above 4.1 Topic Detection by Clustering Items or Tags In order to cluster tags or items we need a distance measure between tags or items, respectively. For clustering items we use the item distance defined in(9) For the distance between tags we use the co-occurrence based similarity proposed in [16. The co-occurrence distribution of a tag z is defined ()=∑p(t1)pc(l) Now the distance between tags is defined straightforwardly as the square root of the Jensen Shannon divergence of their co-occurrence distributions For clustering we use a variant of the complete link algorithm in which in each step we merge two cluster whose merger has a minimal average distance between all elements. This criterion guarantees that at each step the option is hosen that yields the best Calinksi Harabasz index [17. As a stopping criterion, we require the number of clusters to be equal to the square root of the numb of tags. This criterion is rather arbitrary but s quite well 4.2 Using Topic Clusters for Recommendation The topic aware variant of the nearest neighbor algorithm described in section 3.1 can be defined as follows: we cluster the items in Cu and apply the algorithm to each of the clusters In order to merge the recommendation lists the best elements from each clus- ter are selected. The number of items selected from each recommended list is
48 C. Wartena and M. Wibbels Profiles based on co-occurring tags. In [15] we have proposed to condense the user profile by adding co-occurring tags. This is achieved by propagating tag probabilities in a Markov chain on T ∪C having transitions C→T with transition probabilities pT (t|i) and transitions T →C with transition probabilities pC(i|t). The characteristic tag distribution for a user now is defined as: p¯T (t|u) = i,t pT (t|i)pC(i|t )pT (t |u). (12) 4 Topic Aware Recommendation In the three basic algorithms discussed above the relevance of an item for a user is predicted by its similarity to all items considered by the user or by its similarity to all tags in the user profile. As discussed above this might result in uninteresting lists of similar and unspecific items. In order to recommend items more specific for one of the interests a user might have, we propose to cluster the items or the tags in his profile. Now we can generate lists of recommended items for each of the clusters and merge them to obtain a final recommendation. Thus we can construe topic aware variants of all three algorithms discussed above. 4.1 Topic Detection by Clustering Items or Tags In order to cluster tags or items we need a distance measure between tags or items, respectively. For clustering items we use the item distance defined in (9). For the distance between tags we use the co-occurrence based similarity proposed in [16]. The co-occurrence distribution of a tag z is defined as p¯T (t|z) = i∈C pT (t|i)pC(i|z). (13) Now the distance between tags is defined straightforwardly as the square root of the Jensen Shannon divergence of their co-occurrence distributions. For clustering we use a variant of the complete link algorithm in which in each step we merge two cluster whose merger has a minimal average distance between all elements. This criterion guarantees that at each step the option is chosen that yields the best Calinksi Harabasz index [17]. As a stopping criterion, we require the number of clusters to be equal to the square root of the number of tags. This criterion is rather arbitrary but works quite well. 4.2 Using Topic Clusters for Recommendation The topic aware variant of the nearest neighbor algorithm described in section 3.1 can be defined as follows: we cluster the items in Cu and apply the algorithm to each of the clusters. In order to merge the recommendation lists the best elements from each cluster are selected. The number of items selected from each recommended list is
Improving Tag-Based Recommendation by Topic Diversification proportional to the size of the cluster. Merging starts with the items from the largest cluster. If an item is already added from a previous list, the next item is taken For the profile based algorithms it is not that obvious how to make them topi rare. Simply clustering the tags in the profile distribution will result in strange distributions that give bad recommendation results. Some common tags, like non fiction in our data set, are relevant for several topics and should not be assigned exclusively to one cluster For the profiles formed by adding co-occurring tags to the actively used tags this can be achieved by first clustering the tags and then condensing the profile. Thus, for some user u E U let Tu=teTin(u, t)>0 This set of tags now is clustered into clusters Tu1,... Tuk. For each cluster Tuc e compute characteristic tag distributions Pr(tu, c) ∑;Tu,c(t)n(,t,u) and Tu,c(tn(i, t', u) (14) pr(tlu,c)=CP(tli)pc(ilt'pr(elu,c) (15) where we use Tuc as the indicator function of the set T.c. Note that these formulas are very similar to(10)and(12). Now we can use(15)in the algorithm of section 3. 2 to generate recommendations for each topic cluster. For the profiles based on the tags of all viewed items we again start clustering the tags in the profile and then os gfo to end up with distributions over te tag distributions for each cluster by Iding co-occurring tags. However, in the restricted set of tags used in(11) we compute tag co-occurrence only using the set of items considered by the user. Technically this can be obtained by restricting the item distribution of a tag z(see(7))to Cu for each user u pe(il2,n)={2∈ (16) otherwise To obtain the tag distributions for each cluster we substitute pc(ilt, u)for pc(it') in(15). Now it is also natural to use this personalized item distribution for each ag computation of the co-occurrence distribution in(13). Thus each tag gets a personalized co-occurrence distribution and consequently also the distances between tags become personalized. This reflects the fact that different users might use the same tag with a different meaning in different contexts. E. g. for one person the tag Italy is related to food while for some other user it is more closely related to renaissance 5 Evaluation 5.1 Dataset For evaluation we used a selection of data form Library Thing from 18, that was collected such that each user has supplied tags and ratings to at least 20 books
Improving Tag-Based Recommendation by Topic Diversification 49 proportional to the size of the cluster. Merging starts with the items from the largest cluster. If an item is already added from a previous list, the next item is taken. For the profile based algorithms it is not that obvious how to make them topic aware. Simply clustering the tags in the profile distribution will result in strange distributions that give bad recommendation results. Some common tags, like non fiction in our data set, are relevant for several topics and should not be assigned exclusively to one cluster. For the profiles formed by adding co-occurring tags to the actively used tags this can be achieved by first clustering the tags and then condensing the profile. Thus, for some user u ∈ U let Tu = {t ∈ T | n(u, t) > 0}. This set of tags now is clustered into clusters Tu,1,...Tu,k. For each cluster Tu,c we compute characteristic tag distributions pT (t|u, c) = i Tu,c(t)n(i, t, u) i,t Tu,c(t )n(i, t , u) and (14) p¯T (t|u, c) = i,t pT (t|i)pC(i|t )pT (t |u, c), (15) where we use Tu,c as the indicator function of the set Tu,c. Note that these formulas are very similar to (10) and (12). Now we can use (15) in the algorithm of section 3.2 to generate recommendations for each topic cluster. For the profiles based on the tags of all viewed items we again start clustering the tags in the profile and then compute tag distributions for each cluster by adding co-occurring tags. However, in order to end up with distributions over the restricted set of tags used in (11) we compute tag co-occurrence only using the set of items considered by the user. Technically this can be obtained by restricting the item distribution of a tag z (see (7)) to Cu for each user u: pC(i|z, u) = n(i,z) i∈Cu n(i ,z) if i ∈ Cu 0 otherwise (16) To obtain the tag distributions for each cluster we substitute pC(i|t , u) for pC(i|t ) in (15). Now it is also natural to use this personalized item distribution for each tag computation of the co-occurrence distribution in (13). Thus each tag gets a personalized co-occurrence distribution and consequently also the distances between tags become personalized. This reflects the fact that different users might use the same tag with a different meaning in different contexts. E.g. for one person the tag Italy is related to food while for some other user it is more closely related to renaissance. 5 Evaluation 5.1 Dataset For evaluation we used a selection of data form LibraryThing from [18], that was collected such that each user has supplied tags and ratings to at least 20 books
C. Wartena and m. wibbels Table 1. Comparison of diversity(average squared distance)for top-10 recommenda- tion lists ⊥ MAP preca1 o div@0 Most viewed 002400500.56 BPR-MF 0.04400900.69 Co-occur. tags 0.0520099|0.50 Nearest Neighbor tags(TA)|0.037|0.0780,71 Item tags(TA) 0,0840,150,72 Nearest Neighbor(TA)0,072 0, 120,84 and each book has received at least 5 tags. The dataset consists of 37, 232 books tagged by 7, 279 users with in total 10, 559 different tags. The total number of tag assignments is 2,056, 487. In order to validate our recommendation techniques, we split the dataset in a training and a test set, such that the training set contains for each user 80% of his tag assignments. Since there were no time stamps available the split was y 5.2 Baseline algorithms We compare the 6 different tag based recommendation algorithms with 2 base lines that do not use the tags but only the information whether an item was bookmarked by a user or not. As a first baseline we use the strategy that always recommends the most popular items. This recommendation is not personalized and every personalized algorithm should at least be better than this one. As a second baseline we use one of the strongest recommendation techniques fo or top-n recommendation, Bayesian Personalized Ranking(BPR), proposed in 19. We use BPR to learn a matrix factorization model with 16 dimensions. Optimized hyperparameters for this model were determined by grid search on the training data. Tag based recommendation will of course only be useful if its results are better than those obtained by BPi with Matrix Factorization (BPR-MF For all algorithms we computed top 1 to 100 recommendations. Since we did not compute a complete ranking of all 37, 232 items for all users and all algorithms we will use the mean average precision (MAP) computed over the first 100 recommended items as a measure to compare the algorithms. Furthermore, for all Regularization p ers for matrix factorization are: Au=0, 25, AH+=0, 0001, AH-=0, 01. The learning rate is set to 0,02 and 160 iterations are user to compute the model
50 C. Wartena and M. Wibbels Table 1. Comparison of diversity (average squared distance) for top-10 recommendation lists Algorithm MAP prec@10 div@10 Most Viewed 0,024 0,050 0,56 BPR-MF 0,044 0,090 0,69 Co-occur. tags 0,033 0,064 0,50 Item tags 0,052 0.099 0,50 Nearest Neighbor 0,048 0,075 0,48 Co-occur. tags (TA) 0,037 0,078 0,71 Item tags (TA) 0,084 0,15 0,72 Nearest Neighbor (TA) 0,072 0,12 0,84 and each book has received at least 5 tags. The dataset consists of 37,232 books tagged by 7,279 users with in total 10,559 different tags. The total number of tag assignments is 2,056,487. In order to validate our recommendation techniques, we split the dataset in a training and a test set, such that the training set contains for each user 80% of his tag assignments. Since there were no time stamps available, the split was made randomly. 5.2 Baseline Algorithms We compare the 6 different tag based recommendation algorithms with 2 baselines that do not use the tags but only the information whether an item was bookmarked by a user or not. As a first baseline we use the strategy that always recommends the most popular items. This recommendation is not personalized and every personalized algorithm should at least be better than this one. As a second baseline we use one of the strongest recommendation techniques for top-n recommendation, Bayesian Personalized Ranking (BPR), proposed in [19]. We use BPR to learn a matrix factorization model with 16 dimensions. Optimized hyperparameters for this model were determined by grid search on the training data1. Tag based recommendation will of course only be useful if its results are better than those obtained by BPR with Matrix Factorization (BPR-MF). 5.3 Results For all algorithms we computed top 1 to 100 recommendations. Since we did not compute a complete ranking of all 37,232 items for all users and all algorithms we will use the mean average precision (MAP) computed over the first 100 recommended items as a measure to compare the algorithms. Furthermore, for all 1 Regularization parameters for matrix factorization are: λw = 0, 25, λH+ = 0, 0001, λH− = 0, 01. The learning rate is set to 0,02 and 160 iterations are user to compute the model
Improving Tag-Based Recommendation by Topic Diversification algorithms we also compare precision and diversity of a top 10 recommendation As a measure of diversity we take the average squared distance between the recommended items. The diversity for a set items I is thus defined as diu(n)>iie JSD(a(), q()) 2 where JSD is the Jensen-Shannon divergence, which is the square of a proper distance measure. The results for MAP, precision at 10 and diversity at 10 are given in Table 1 +Co-occur. tag 一 Co-occur.tags(TA) tr Nearest neighbors cHIte tags Fig 1. Precision and recall for 3 proposed topic aware algorithms compared to their basic (non-topic aware)variants, using a sample of Library Thing data. Data points are plotted for top-n recommendation with n= 1, 5, 10, 20... 100. In Figure 1 precision and recall for the 6 tag based algorithms are show Comparison of the algorithms clearly shows that each of the 3 discussed basic algorithms benefits from making them topic aware in the proposed way. only for a top l and a top 2 the topic aware variants of the tag based collaborative filtering and the algorithm using a profile based on co-occurring tags have a lower precision than their non-topic aware variants. In these cases the number of clusters is larger than the number of items that we have to predict. Thus, most of the available information is not used. Remarkably, the third topic aware dgorithm does not show a similar behavior for small recommendation lists If we compare the algorithms to the baselines( Figure 2)we see that all al gorithms are clearly better than the non-personalized most viewed recommen- dation. T-tests show that differences between each tag aware algorithm and its base variant for the MAP and for prec@10 are significant at the p<0, 001 level Also all differences to the base-line algorithms are significant at the same level
Improving Tag-Based Recommendation by Topic Diversification 51 algorithms we also compare precision and diversity of a top 10 recommendation. As a measure of diversity we take the average squared distance between the recommended items. The diversity for a set items I is thus defined as div(I) = i,j∈I JSD(q(i), q(j)) |I| 2 (17) where JSD is the Jensen-Shannon divergence, which is the square of a proper distance measure. The results for MAP, precision at 10 and diversity at 10 are given in Table 1. Fig. 1. Precision and recall for 3 proposed topic aware algorithms compared to their basic (non-topic aware) variants, using a sample of LibraryThing data. Data points are plotted for top-n recommendation with n = 1, 5, 10, 20 ... 100. In Figure 1 precision and recall for the 6 tag based algorithms are shown. Comparison of the algorithms clearly shows that each of the 3 discussed basic algorithms benefits from making them topic aware in the proposed way. Only for a top 1 and a top 2 the topic aware variants of the tag based collaborative filtering and the algorithm using a profile based on co-occurring tags have a lower precision than their non-topic aware variants. In these cases the number of clusters is larger than the number of items that we have to predict. Thus, most of the available information is not used. Remarkably, the third topic aware algorithm does not show a similar behavior for small recommendation lists. If we compare the algorithms to the baselines (Figure 2) we see that all algorithms are clearly better than the non-personalized most viewed recommendation. T-tests show that differences between each tag aware algorithm and its base variant for the MAP and for prec@10 are significant at the p < 0, 001 level. Also all differences to the base-line algorithms are significant at the same level.
C. Wartena and m. wibbels Most viewed NEarest neighbors D,1 Recall 0,15 Fig. 2. Precision and recall for 3 proposed topic aware algorithms compared to 2 base lines, using a sample of Library Thing data. Data points are plotted for top-n recom- undation with n=1. 5. 10.20....10 The results clearly show that tags can be very useful for recommendation nally, we see that the diversity for the topic aware algorithms is clearly higher than for the non-topic aware content based algorithms. Interestingly, also the diversity of BPR-MF is relatively high 6 Conclusion and Future work We have shown that clustering user tags can significantly improve tag based item recommendation. This corresponds to the intuition that people have dif- ferent interests and that it is better to recommend items on each separate topic than trying to find items that match more or less all interests. Though this is very intuitive, it is nevertheless a surprising result. Given results from previ- ous research, we expect that improvement of diversity has to be paid for by loss of precision and recall Our clustering approach in contrary improves both diversity and precision and recall Another nice aspect of the proposed algorithms is that it is easy to explain to users why items are recommended: The topics detected can be displayed, e. g by a cloud of most frequent or most central tags. The recommendation then can e motivated by the relevance of the items for one of the detected topics The two algorithms that turned out to be the best ones, do not or almost not rely on the fact that the user is an active tagger. Thus these methods can be applied for content based recommendation in general. An interesting question for future research is, to what type of item sets and meta-data the results carry
52 C. Wartena and M. Wibbels Fig. 2. Precision and recall for 3 proposed topic aware algorithms compared to 2 base lines, using a sample of LibraryThing data. Data points are plotted for top-n recommendation with n = 1, 5, 10, 20,... 100. The results clearly show that tags can be very useful for recommendation. Finally, we see that the diversity for the topic aware algorithms is clearly higher than for the non-topic aware content based algorithms. Interestingly, also the diversity of BPR-MF is relatively high. 6 Conclusion and Future Work We have shown that clustering user tags can significantly improve tag based item recommendation. This corresponds to the intuition that people have different interests and that it is better to recommend items on each separate topic than trying to find items that match more or less all interests. Though this is very intuitive, it is nevertheless a surprising result. Given results from previous research, we expect that improvement of diversity has to be paid for by loss of precision and recall. Our clustering approach in contrary improves both, diversity and precision and recall. Another nice aspect of the proposed algorithms is that it is easy to explain to users why items are recommended: The topics detected can be displayed, e.g. by a cloud of most frequent or most central tags. The recommendation then can be motivated by the relevance of the items for one of the detected topics. The two algorithms that turned out to be the best ones, do not or almost not rely on the fact that the user is an active tagger. Thus these methods can be applied for content based recommendation in general. An interesting question for future research is, to what type of item sets and meta-data the results carry over