corresponding topic is just 0.102. It_中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：Improving Collaborative Filtering with Tag-Based Neighborhood Method

正在加载图片...

corresponding topic is just 0.102. It illustrates that old similar preferences for the movies with sim Table 1. RMSE rison with other approaches(A ilar probability under each topic. So we posit that smaller RMSE va a better perfo onsulting the movies with similar probability under each topic can help improve personalized rating pre- entage 207 8079 diction L:44■130271127510762 16921.11871.05656 0173 Metrics We use the Root Mean Square Error(RMSE)metrics Nghbru LAvg 0.88111 0.87990.880710.8803 to measure the prediction quality of our proposed ap- NgU.S020.87920.879008788 proach in comparison with other collaborative methods g0.88020.87980.879108789 RMSE is defined as: 0.8796 Nghbr vg08669 0.8665 RMSE=点-)2 Wgt0.86610.86580.86570.8655 The results in Table 1 show our neighborhood recom- mendation method outperforms the improved regular where rii and fii are the actual rating and predicted rat ized SVD method more than 41%. NMF 36%. and PMF ing from n users for m movies. plus, we use rounde 23%. We would like to analyze the results more specifi value as our predicted rating. The errors of predic cally: 1) For all these algorithms in Table 1, the predic tion with rounded rating value are more obvious. But tion accuracy increases as the training set's percentage whether the rating is rounded or unrounded, the com- ascends. This is reasonable because with high training parison result between different approaches does not data percentage, our algorithms find more neighbors to change a lo consult. The more neighbors we find, the more accu- racy we get. 2)Among our own several methods, the version Nghbra-Wgt presents the best performance. It Compariso illustrates that utilizing all the tag information and as- We compare our approach with two collaborative filter- signing different weights to thisthis tag information is ing algorithms: Non-negative Matrix Factorization(NMF) meaningful. 3)We also observe that item tag analy method, PMF method and the improved regularized SVD method. In fact, one of the most difficult prob- Although the difference is subtle, it explains the fact lems in our work is to find some coordinate algorithms that the item-based collaborative filtering approaches for comparison. Because our intention is to provide a e more popular than the user-based ones in early generalized item recommendation model to combine the works. 4)Besides, we find the performance increase of use of ratings and tags, most of the related work is in- hbra is obvious compared with Nghbru and n ghori applicable to the data resource in this situation. We This illustrates that the fusion of the user tag analysis choose three of the most popular algorithms by expedi- d the item tag analysis is lossless. 5 )Nevertheless the performance of the weighted version of Nghbra Nghbru and Nghbri is not much better than their av- The parameters of these two method also need to be erage counterparts. This can be explained by the ho- tuned. According to the relative works and our ex mogeneity of users. There are no authorities to give the periments,the best parameters for the PMF approach overwhelmingly important rating. The phenomenon re- on Movielens dataset are like these: Au =0.001, Ay fects the democracy in the online social netw 0.0001. Concerning the improved regularized SVD method Irate =0.001, A=0.02, A2=0.05. We set the number of feature dimensions as 80. We think this assignment is Parameter Analysis reasonable because the commonly used feature dimen- For topic finding, we set the Dirichlet priors a and B to sion for these matrix factorization is between 30 and 0/K and 0.1, respectively(K is the number of topics hese two hyper-parameters are the empirical values for LDA. The threshold value of processing probabilis- We have six versions of the improved collaborative fil- tic matrices e and e is set as 0.03 which means sta- tering methods. Nghbru represents the neighborhood tistically impossible. The other two parameters, itera ecommendation method based only on the user-tag tion number and topic number are unfixed. We explor analysis. N ghori corresponds to the variant based only the optimal solutions for them. Because the parame- on the item-tag analysis. Nghbra integrates the use of ters of topic finding are different regarding the objects he user-tag analysis and the item-tag analysis. Each of to analyze, we separate the process of tag analysis into these three methods, there are two different weighting user-tag analysis and item-tag analysis. We observe a strategies. One is to use uniform weights, labeled as shape RMSe increases with huge vibrations after 340 Avg the other is to use different weights, labeled as iterations for user-tag analysis. This can be seen as the signal of overfitting. Regarding the item-tag analysiscorresponding topic is just 0.102. It illustrates that users hold similar preferences for the movies with similar probability under each topic. So we posit that consulting the movies with similar probability under each topic can help improve personalized rating prediction. Metrics We use the Root Mean Square Error (RMSE) metrics to measure the prediction quality of our proposed approach in comparison with other collaborative methods. RMSE is defined as: RMSE = P N i=1 P M j=1 (rij − rˆij ) 2 I R ij P N i=1 P M j=1 I R ij , (5) where rij and ˆrij are the actual rating and predicted rating from N users for M movies. Plus, we use rounded value as our predicted rating. The errors of prediction with rounded rating value are more obvious. But whether the rating is rounded or unrounded, the comparison result between different approaches does not change a lot. Comparison We compare our approach with two collaborative filtering algorithms:Non-negative Matrix Factorization (NMF) method, PMF method and the improved regularized SVD method. In fact, one of the most difficult problems in our work is to find some coordinate algorithms for comparison. Because our intention is to provide a generalized item recommendation model to combine the use of ratings and tags, most of the related work is inapplicable to the data resource in this situation. We choose three of the most popular algorithms by expediency. The parameters of these two method also need to be tuned. According to the relative works and our experiments, the best parameters for the PMF approach on Movielens dataset are like these: λu = 0.001, λv = 0.0001. Concerning the improved regularized SVD method, lrate = 0.001, λ = 0.02, λ2 = 0.05. We set the number of feature dimensions as 80. We think this assignment is reasonable because the commonly used feature dimension for these matrix factorization is between 30 and 100. We have six versions of the improved collaborative filtering methods. Nghbru represents the neighborhood recommendation method based only on the user-tag analysis. Nghbri corresponds to the variant based only on the item-tag analysis. Nghbra integrates the use of the user-tag analysis and the item-tag analysis. Each of these three methods, there are two different weighting strategies. One is to use uniform weights, labeled as “Avg”; the other is to use different weights, labeled as “Wgt”. Table 1. RMSE comparison with other approaches (A smaller RMSE value means a better performance) RMSE Percentage 20% 50% 80% 99% NMF 1.4854 1.3027 1.1275 1.0762 irSVD 1.3176 1.2591 1.1928 1.1087 PMF 1.1692 1.1187 1.05656 1.0173 Nghbru Avg 0.8811 0.8799 0.8807 0.8803 Wgt 0.8802 0.8792 0.8796 0.8788 Nghbri Avg 0.8802 0.8798 0.8791 0.8789 Wgt 0.8802 0.8796 0.8790 0.8788 Nghbra Avg 0.8669 0.8668 0.8665 0.8662 Wgt 0.8661 0.8658 0.8657 0.8655 The results in Table 1 show our neighborhood recommendation method outperforms the improved regularized SVD method more than 41% , NMF 36%, and PMF 23%. We would like to analyze the results more specifi- cally: 1)For all these algorithms in Table 1, the prediction accuracy increases as the training set’s percentage ascends. This is reasonable because with high training data percentage, our algorithms find more neighbors to consult. The more neighbors we find, the more accuracy we get. 2)Among our own several methods, the version Nghbra-Wgt presents the best performance. It illustrates that utilizing all the tag information and assigning different weights to this this tag information is meaningful. 3)We also observe that item tag analysis is a little more effective than the user tag analysis. Although the difference is subtle, it explains the fact that the item-based collaborative filtering approaches are more popular than the user-based ones in early works. 4)Besides, we find the performance increase of Nghbra is obvious compared with Nghbru and Nghbri. This illustrates that the fusion of the user tag analysis and the item tag analysis is lossless. 5)Nevertheless, the performance of the weighted version of Nghbra, Nghbru and Nghbri is not much better than their average counterparts. This can be explained by the homogeneity of users. There are no authorities to give the overwhelmingly important rating. The phenomenon re- flects the democracy in the online social network. Parameter Analysis For topic finding, we set the Dirichlet priors α and β to 50/K and 0.1, respectively (K is the number of topics). These two hyper-parameters are the empirical values for LDA. The threshold value of processing probabilistic matrices ΘU and ΘI is set as 0.03 which means statistically impossible. The other two parameters, iteration number and topic number are unfixed. We explore the optimal solutions for them. Because the parameters of topic finding are different regarding the objects to analyze, we separate the process of tag analysis into user-tag analysis and item-tag analysis. We observe a shape RMSE increases with huge vibrations after 340 iterations for user-tag analysis. This can be seen as the signal of overfitting. Regarding the item-tag analysis

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：Improving Collaborative Filtering with Tag-Based Neighborhood Method