正在加载图片...
is based on the observation that recommending items that are extremely popular in the datasets is al ways of high probability to be correct thus making algorithm comparison less sensitive. After all these preprocessing steps, two datasets named as Small and Large were obtained. Table 1 provides key statistics of these two datasets Table 1. dataset characteristics Number of users Number of tags Number of transactions Density level (% Average number of URls per user Average number of users per URL We randomly divided each of the datasets into a training set and a test set. The split was based on the training-testing ratio between 20%0-80% and was done at the user level. In the predication phase, all methods recommended 5 items for each user and then compared them with the items in the test set. The evaluation metrics adopted in our experiment were the commonly used ones including precision, recall, F- measure,and rankscore for ranked list prediction Twelve algorithms were evaluated in our experiments. RaNd algorithm generates random recommendations for every user, while PoP algorithm recommends the most popular items to each user The user-based (UB)and item-based (IB)algorithms, as well as their variants namely tagging user-based (TUB)and tagging item-based (TiB)(Peng et al. 2009)were implemented as the baselines. In addition to the topic-based (TB)method, the svd dimensionality reduction method (SVD) which finds hidden structure using singular value decomposition and the fusion algorithm(FUS)extending user/item profiles with tags are also included for comparison. Furthermore, to investigate the benefits of tag generalization we integrate it into the TB and SB algorithms, and rename them as gtB and GSB respectively. The experiment was rerun for 10 times and the final results are averaged over all runs. Table 2 summarizes the results on both datasets Table 2. Experimental results on the Small and large dataset ct Algorithm Precision(%)Recall (%) F-measure(%) Rank Se 23 4.05 22.l8 1 DBB 19.25 2231 27 2468 251 4.55 22.99 222 4.40 23.98 2834 28.50 5.53 RAND 1.74 .39 3.82 108 276 SVD 5.15 1.16 5.7 0.89 672083 6931 0.96 1.75 87 1.77 19th Workshop on Information Technologies and Systemsis based on the observation that recommending items that are extremely popular in the datasets is always of high probability to be correct thus making algorithm comparison less sensitive. After all these preprocessing steps, two datasets named as Small and Large were obtained. Table 1 provides key statistics of these two datasets. Table 1. Dataset characteristics Dataset Small Large Number of users 485 1097 Number of URLs 999 1872 Number of tags 11307 9608 Number of transactions 24612 44599 Density level (%) 5.08 2.17 Average number of URLs per user 50.75 40.66 Average number of users per URL 24.64 23.82 We randomly divided each of the datasets into a training set and a test set. The split was based on the training-testing ratio between 20%-80% and was done at the user level. In the predication phase, all methods recommended 5 items for each user and then compared them with the items in the test set. The evaluation metrics adopted in our experiment were the commonly used ones including precision, recall, F￾measure, and rankscore for ranked list prediction. Twelve algorithms were evaluated in our experiments. RAND algorithm generates random recommendations for every user, while POP algorithm recommends the most popular items to each user. The user-based (UB) and item-based (IB) algorithms, as well as their variants namely tagging user-based (TUB) and tagging item-based (TIB) (Peng et al. 2009) were implemented as the baselines. In addition to the topic-based (TB) method, the SVD dimensionality reduction method (SVD) which finds hidden structure using singular value decomposition and the fusion algorithm (FUS) extending user/item profiles with tags are also included for comparison. Furthermore, to investigate the benefits of tag generalization, we integrate it into the TB and SB algorithms, and rename them as GTB and GSB respectively. The experiment was rerun for 10 times and the final results are averaged over all runs. Table 2 summarizes the results on both datasets. Table 2. Experimental results on the Small and Large dataset Dataset Algorithm Precision (%) Recall (%) F-measure (%) Rank Score Small RAND 4.27 0.43 0.79 4.27 POP 9.81 1.00 1.81 9.84 UB 22.00 2.23 4.05 22.18 IB 13.52 1.37 2.49 13.52 SVD 19.10 1.93 3.51 19.25 TUB 21.02 2.14 3.88 21.20 TIB 22.31 2.27 4.12 22.27 FUS 24.68 2.51 4.55 24.76 TB 22.99 2.34 4.24 23.15 SB 23.87 2.42 4.40 23.98 GTB 28.34 2.88 5.22 28.50 GSB 30.00 3.05 5.53 30.28 Large RAND 1.74 0.22 0.39 1.73 POP 3.82 0.49 0.86 3.82 UB 4.81 0.61 1.08 4.85 IB 2.75 0.35 0.62 2.76 SVD 5.15 0.65 1.16 5.17 TUB 7.38 0.94 1.66 7.45 TIB 7.06 0.89 1.59 7.06 FUS 7.71 0.98 1.73 7.75 TB 7.62 0.96 1.71 7.67 SB 7.80 0.99 1.75 7.87 GTB 7.89 1.00 1.77 7.99 GSB 8.33 1.05 1.87 8.39 77 19th Workshop on Information Technologies and Systems
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有