正在加载图片...
Table l SELECTED TOPICS LEARNED FROM THE UTT MODEL WITH T=200. FOR EACH TOPIC THE FIVE MOST PROBABLY WORDS AND TAGS ARE LISTED Topic 18 search 0.171ir retrie inform 0.023 complexity 0.069 retrieva 0.020 0.007 feed back 0.029 evaluation 0.041 Table iv NDCG EVALUATION FOR DIFFERENT NUMBER OF RECOMMENDATIONS NDCG @5@10@15al 0.2 e30.250.310.340.40 TT,T=2000.290.370.390.47 TT.T=300 290.370390.47 TT,T=4000.300.370.400.4 UTT. T: 0.400.50 UTT,T=3000.320.380410.51 UTT,T=4000340400430.52 (with n, the size of a group) Figure 3. divergence of this artificial group. This step was repeated indicate the true group divergence. ach group. The stars 1000 times. Afterwards these results are compared to the true group divergence. Figure 3 shows the corresponding boxplot for the 1000 samplings for each group. On each 3) Assessing User Similarity with the UTT Model: In box, the central red line is the median, the edges of the box order to test if the UTT model is able to identify similar are the 25th and 75th percentiles. The whiskers were chosen members of groups in the Citel n our data set which are such that all data points within +2.7o are considered not as users, we identified all users given Like system. CiteULike- outliers. The stars in the plot indicate the true divergence for belong to one research lab, like for instance the Carnegie the just mentioned percentiles. Furthermore, 20 out of 27 Mellon University Human Interaction Institute with a group groups are not within +2.7o. When using the document of 26 users. In our data set there are 488 users out of 1393 or tag distribution of a user as baseline to compute user which belong to a total of 524 groups(as of November 18, similarities, none of the 27 true group divergences fall out 2008). We excluded all groups with less than five members. of +2. 7o(Results not shown for the sake of brevity, but This resulted in a total of 27 groups with 160 users. 31 user available online") belong to more than one group and the maximum number 4) Personalized Tag Recommendation: We perform eval of groups for one user is five. We derive the similarity uation on a post basis, i. e. given an user u e U and a between users based on the learned user-topic distributions resource r e R, we want to predict a recommendation or eu. Since each user is represented as a multinomial over ranking of tags te lu the topics T, Jeffreys' J-divergence a symmetric version of Baselines: We follow the baseline methods of previous e Kullback-Leibler(KL)divergence, is used. Jeffreys'J- work on tag prediction [1], but in addition provide personal- divergence originates from information theory and is a ized versions. The TT and the UTT model are benchmarked method to compute the similarity between two probability against three standard tag recommendation methods distribu Most popular tags: Tags for a resource r are predicted Our assumption is that users that share the same group based on the relative frequency in the collaborative tag system( Baseline 1) other than users that are randomly chosen and considered Most popular tags with user restriction: Tags for a as an artificial group. Therefore, we repeated the following resource r are first ranked according to the relative procedure for each group: We randoml led n users quency and than are reduced to the set of tags TuTable II SELECTED TOPICS, LEARNED FROM THE UTT MODEL WITH T=200. FOR EACH TOPIC THE FIVE MOST PROBABLY WORDS AND TAGS ARE LISTED Topic 18 Word Prob. Tag Prob. network 0.51 network 0.36 connect 0.040 networks 0.266 complex 0.035 graph 0.009 structur 0.023 complexity 0.009 topolog 0.020 complex 0.007 Topic 84 Word Prob. Tag Prob. search 0.171 ir 0.21 retriev 0.125 search 0.059 inform 0.077 information-retrieval 0.054 relev 0.069 retrieval 0.042 feedback 0.029 evaluation 0.041 1 1.5 2 2.5 3 3.5 4 4.5 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Groups JJ−divergence Figure 3. Boxplot over 1000 random samplings for each group. The stars indicate the true group divergence. 3) Assessing User Similarity with the UTT Model: In order to test if the UTT model is able to identify similar users, we identified all users given in our data set which are members of groups in the CiteULike system. CiteULike￾groups typically share similar research interests and often belong to one research lab, like for instance the Carnegie Mellon University Human Interaction Institute with a group of 26 users. In our data set there are 488 users out of 1393 which belong to a total of 524 groups (as of November 18, 2008). We excluded all groups with less than five members. This resulted in a total of 27 groups with 160 users. 31 user belong to more than one group and the maximum number of groups for one user is five. We derive the similarity between users based on the learned user-topic distributions Θu. Since each user is represented as a multinomial over the topics T, Jeffreys‘ J-divergence a symmetric version of the Kullback-Leibler (KL) divergence, is used. Jeffreys‘J￾divergence originates from information theory and is a method to compute the similarity between two probability distributions. Our assumption is that users that share the same group membership should be significantly more similar to each other than users that are randomly chosen and considered as an artificial group. Therefore, we repeated the following procedure for each group: We randomly sampled n users Table IV NDCG EVALUATION FOR DIFFERENT NUMBER OF RECOMMENDATIONS. NDCG @5 @10 @15 all Baseline 1 0.04 0.05 0.06 0.19 Baseline 2 0.14 0.20 0.23 0.37 Baseline 3 0.25 0.31 0.34 0.40 TT, T=200 0.29 0.37 0.39 0.47 TT, T=300 0.29 0.37 0.39 0.47 TT, T=400 0.30 0.37 0.40 0.49 UTT, T=200 0.31 0.37 0.40 0.50 UTT, T=300 0.32 0.38 0.41 0.51 UTT, T=400 0.34 0.40 0.43 0.52 (with n, the size of a group) and computed the mean divergence of this artificial group. This step was repeated 1000 times. Afterwards these results are compared to the true group divergence. Figure 3 shows the corresponding boxplot for the 1000 samplings for each group. On each box, the central red line is the median, the edges of the box are the 25th and 75th percentiles. The whiskers were chosen such that all data points within ±2.7σ are considered not as outliers. The stars in the plot indicate the true divergence for each group. All true group divergences fall clearly below the just mentioned percentiles. Furthermore, 20 out of 27 groups are not within ±2.7σ. When using the document or tag distribution of a user as baseline to compute user similarities, none of the 27 true group divergences fall out of ±2.7σ (Results not shown for the sake of brevity, but available online4 ). 4) Personalized Tag Recommendation: We perform eval￾uation on a post basis, i. e. given an user u ∈ U and a resource r ∈ R, we want to predict a recommendation or ranking of tags t ∈ Tu. Baselines: We follow the baseline methods of previous work on tag prediction [1], but in addition provide personal￾ized versions. The TT and the UTT model are benchmarked against three standard tag recommendation methods. • Most popular tags: Tags for a resource r are predicted based on the relative frequency in the collaborative tag system. (Baseline 1) • Most popular tags with user restriction: Tags for a resource r are first ranked according to the relative frequency and than are reduced to the set of tags Tu
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有