正在加载图片...
48 C. Wartena and m. wibbels Profiles based on co-occurring tags. In [15 we have proposed to condense the user profile by adding co-occurring tags. This is achieved by propagating tag probabilities in a Markov chain on T UC having transitions C-T with transition probabilities pr(t i) and transitions T-C with transition probabilities pc(it) The characteristic tag distribution for a user now is defined as pr(tlu)=>PT(tli)pc(ile)pr(tlu) (12) 4 Topic Aware Recommendation In the three basic algorithms discussed above the relevance of an item for a user is predicted by its similarity to all items considered by the user or by its similarity to all tags in the user profile. As discussed above this might result in uninteresting lists of similar and unspecific items. In order to recommend items more specific for one of the interests a user might have, we propose to cluster the items or the tags in his profile. Now we can generate lists of recommended items for each of the clusters and merge them to obtain a final recommendation. Thus e can construe topic aware variants of all three algorithms discussed above 4.1 Topic Detection by Clustering Items or Tags In order to cluster tags or items we need a distance measure between tags or items, respectively. For clustering items we use the item distance defined in(9) For the distance between tags we use the co-occurrence based similarity proposed in [16. The co-occurrence distribution of a tag z is defined ()=∑p(t1)pc(l) Now the distance between tags is defined straightforwardly as the square root of the Jensen Shannon divergence of their co-occurrence distributions For clustering we use a variant of the complete link algorithm in which in each step we merge two cluster whose merger has a minimal average distance between all elements. This criterion guarantees that at each step the option is hosen that yields the best Calinksi Harabasz index [17. As a stopping criterion, we require the number of clusters to be equal to the square root of the numb of tags. This criterion is rather arbitrary but s quite well 4.2 Using Topic Clusters for Recommendation The topic aware variant of the nearest neighbor algorithm described in section 3.1 can be defined as follows: we cluster the items in Cu and apply the algorithm to each of the clusters In order to merge the recommendation lists the best elements from each clus- ter are selected. The number of items selected from each recommended list is48 C. Wartena and M. Wibbels Profiles based on co-occurring tags. In [15] we have proposed to condense the user profile by adding co-occurring tags. This is achieved by propagating tag probabilities in a Markov chain on T ∪C having transitions C→T with transition probabilities pT (t|i) and transitions T →C with transition probabilities pC(i|t). The characteristic tag distribution for a user now is defined as: p¯T (t|u) = i,t pT (t|i)pC(i|t )pT (t |u). (12) 4 Topic Aware Recommendation In the three basic algorithms discussed above the relevance of an item for a user is predicted by its similarity to all items considered by the user or by its similarity to all tags in the user profile. As discussed above this might result in uninteresting lists of similar and unspecific items. In order to recommend items more specific for one of the interests a user might have, we propose to cluster the items or the tags in his profile. Now we can generate lists of recommended items for each of the clusters and merge them to obtain a final recommendation. Thus we can construe topic aware variants of all three algorithms discussed above. 4.1 Topic Detection by Clustering Items or Tags In order to cluster tags or items we need a distance measure between tags or items, respectively. For clustering items we use the item distance defined in (9). For the distance between tags we use the co-occurrence based similarity proposed in [16]. The co-occurrence distribution of a tag z is defined as p¯T (t|z) = i∈C pT (t|i)pC(i|z). (13) Now the distance between tags is defined straightforwardly as the square root of the Jensen Shannon divergence of their co-occurrence distributions. For clustering we use a variant of the complete link algorithm in which in each step we merge two cluster whose merger has a minimal average distance between all elements. This criterion guarantees that at each step the option is chosen that yields the best Calinksi Harabasz index [17]. As a stopping criterion, we require the number of clusters to be equal to the square root of the number of tags. This criterion is rather arbitrary but works quite well. 4.2 Using Topic Clusters for Recommendation The topic aware variant of the nearest neighbor algorithm described in section 3.1 can be defined as follows: we cluster the items in Cu and apply the algorithm to each of the clusters. In order to merge the recommendation lists the best elements from each clus￾ter are selected. The number of items selected from each recommended list is
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有