480 P. Yin, M. Zhang, and x li 3 Experimental Evaluation The dataset is adapted from citeulike including ten thousands literatures with tags We use the 4-fold cross validation and the"All but one"scheme [5]. One literature is removed randomly from the tagged literatures of each user in the test dataset, and then the modified test dataset is merged into the training dataset. Then a top-10 rec- ommendation is run on the whole dataset The hit percentage [5] is used to express this expectation that the removed litera- ture can hit-percentage=hitcount/testset Here hitcount denotes the number of successful recommendations and itestsetl denotes the size of the testset. that is the number of recommendations made Tag-0I TTD-T TTD-TK TTD-TKK TTC-TKK Fig 1 Comparison of different algorithms for top-10 recommendation All the algorithms which are used in the experiment are list below and figure 1 gives the result of all algorithms. Tag-01: The baseline experiment which uses the user-user collaborative filtering Igorithm. The rate is 0 or 1 according to whether the user has tagged the item. Tag-text-dotproduct-T(TTD-T): User model and literature model are both repre sented as tag frequency vector Dot-product-based similarity is used for the computa tion of user interest degree Tag-text-dotproduct-TK(TTD-TK): Almost the same with TTD-T except extend- ing the user and literature model by literature keyword Tag-text-dotproduct-TKK(TTD-TKK: Almost the same with TTD-T except ex ending the user and literature model by keywords of the literature and the literatures shttp://www.citeulike.org480 P. Yin, M. Zhang, and X. Li 3 Experimental Evaluation The dataset is adapted from citeulike3 including ten thousands literatures with tags. We use the 4-fold cross validation and the “All but one” scheme [5]. One literature is removed randomly from the tagged literatures of each user in the test dataset, and then the modified test dataset is merged into the training dataset. Then a top-10 rec￾ommendation is run on the whole dataset. The hit percentage [5] is used to express this expectation that the removed litera￾ture can be recommended. hit percentage hitcount testset − = / (5) Here hitcount denotes the number of successful recommendations and |testset| denotes the size of the testset, that is, the number of recommendations made. 0% 10% 20% 30% 40% 50% Tag-01 TTD-T TTD-TK TTD-TKK TTC-TKK h i t - p e r c e n t a g e Fig. 1. Comparison of different algorithms for top-10 recommendation All the algorithms which are used in the experiment are list below and figure 1 gives the result of all algorithms. Tag-01: The baseline experiment which uses the user-user collaborative filtering algorithm. The rate is 0 or 1 according to whether the user has tagged the item. Tag-text-dotproduct-T (TTD-T): User model and literature model are both repre￾sented as tag frequency vector. Dot-product-based similarity is used for the computa￾tion of user interest degree. Tag-text-dotproduct-TK (TTD-TK): Almost the same with TTD-T except extend￾ing the user and literature model by literature keywords. Tag-text-dotproduct-TKK (TTD-TKK): Almost the same with TTD-T except ex￾tending the user and literature model by keywords of the literature and the literature’s citations. 3 http://www.citeulike.org
©2008-现在 cucdc.com 高等教育资讯网 版权所有