正在加载图片...
Improving Tag-Based Recommendation by Topic Diversification proportional to the size of the cluster. Merging starts with the items from the largest cluster. If an item is already added from a previous list, the next item is taken For the profile based algorithms it is not that obvious how to make them topi rare. Simply clustering the tags in the profile distribution will result in strange distributions that give bad recommendation results. Some common tags, like non fiction in our data set, are relevant for several topics and should not be assigned exclusively to one cluster For the profiles formed by adding co-occurring tags to the actively used tags this can be achieved by first clustering the tags and then condensing the profile. Thus, for some user u E U let Tu=teTin(u, t)>0 This set of tags now is clustered into clusters Tu1,... Tuk. For each cluster Tuc e compute characteristic tag distributions Pr(tu, c) ∑;Tu,c(t)n(,t,u) and Tu,c(tn(i, t', u) (14) pr(tlu,c)=CP(tli)pc(ilt'pr(elu,c) (15) where we use Tuc as the indicator function of the set T.c. Note that these formulas are very similar to(10)and(12). Now we can use(15)in the algorithm of section 3. 2 to generate recommendations for each topic cluster. For the profiles based on the tags of all viewed items we again start clustering the tags in the profile and then os gfo to end up with distributions over te tag distributions for each cluster by Iding co-occurring tags. However, in the restricted set of tags used in(11) we compute tag co-occurrence only using the set of items considered by the user. Technically this can be obtained by restricting the item distribution of a tag z(see(7))to Cu for each user u pe(il2,n)={2∈ (16) otherwise To obtain the tag distributions for each cluster we substitute pc(ilt, u)for pc(it') in(15). Now it is also natural to use this personalized item distribution for each ag computation of the co-occurrence distribution in(13). Thus each tag gets a personalized co-occurrence distribution and consequently also the distances between tags become personalized. This reflects the fact that different users might use the same tag with a different meaning in different contexts. E. g. for one person the tag Italy is related to food while for some other user it is more closely related to renaissance 5 Evaluation 5.1 Dataset For evaluation we used a selection of data form Library Thing from 18, that was collected such that each user has supplied tags and ratings to at least 20 booksImproving Tag-Based Recommendation by Topic Diversification 49 proportional to the size of the cluster. Merging starts with the items from the largest cluster. If an item is already added from a previous list, the next item is taken. For the profile based algorithms it is not that obvious how to make them topic aware. Simply clustering the tags in the profile distribution will result in strange distributions that give bad recommendation results. Some common tags, like non fiction in our data set, are relevant for several topics and should not be assigned exclusively to one cluster. For the profiles formed by adding co-occurring tags to the actively used tags this can be achieved by first clustering the tags and then condensing the profile. Thus, for some user u ∈ U let Tu = {t ∈ T | n(u, t) > 0}. This set of tags now is clustered into clusters Tu,1,...Tu,k. For each cluster Tu,c we compute characteristic tag distributions pT (t|u, c) = i Tu,c(t)n(i, t, u) i,t Tu,c(t )n(i, t , u) and (14) p¯T (t|u, c) = i,t pT (t|i)pC(i|t )pT (t |u, c), (15) where we use Tu,c as the indicator function of the set Tu,c. Note that these formulas are very similar to (10) and (12). Now we can use (15) in the algorithm of section 3.2 to generate recommendations for each topic cluster. For the profiles based on the tags of all viewed items we again start clustering the tags in the profile and then compute tag distributions for each cluster by adding co-occurring tags. However, in order to end up with distributions over the restricted set of tags used in (11) we compute tag co-occurrence only using the set of items considered by the user. Technically this can be obtained by restricting the item distribution of a tag z (see (7)) to Cu for each user u: pC(i|z, u) =  n(i,z) i∈Cu n(i ,z) if i ∈ Cu 0 otherwise (16) To obtain the tag distributions for each cluster we substitute pC(i|t , u) for pC(i|t ) in (15). Now it is also natural to use this personalized item distribution for each tag computation of the co-occurrence distribution in (13). Thus each tag gets a personalized co-occurrence distribution and consequently also the distances between tags become personalized. This reflects the fact that different users might use the same tag with a different meaning in different contexts. E.g. for one person the tag Italy is related to food while for some other user it is more closely related to renaissance. 5 Evaluation 5.1 Dataset For evaluation we used a selection of data form LibraryThing from [18], that was collected such that each user has supplied tags and ratings to at least 20 books
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有