正在加载图片...
4.2 Evaluation Metric Table 3:p-values for the significance tests For consistency with experiments reported in the litera- )5 D=10 D=20 ture,we use the Mean Absolute Error (MAE)as evaluation 20%Training3.91×10-5 8.27×10-7 1.04×10-16 metric.MAE gives the average absolute deviation of predic- 40%Training 4.11×10-8 2.10×10-16 4.52×10-16 tion from the ground truth: 60%Training 1.35 x 10-11 4.20×10-12 4.15×10-12 MAE=∑:∑YlR-l 80%Training1.24×10-8 2.85×1025.99×10-12 ∑,写 where Ri;and Ra are the true and predicted rating values, In order to compare TagiCoFi with PMF more thoroughly, respectively.A smaller value of MAE indicates a better we compare their performance on users with different num- performance bers of observed ratings.The results are shown in Figure 1, In our experiments,we randomly split the rating records from which we can find that TagiCoFi outperforms PMF for into two parts,each of which contains 50%of the observa- all users and the improvement is more significant for users tions in the rating matrix.One part is used as the test set with only few observed ratings.This is a very promising which is kept the same for all experiments.The other part property of TagiCoFi because those users with a small num- is used as a pool from which training sets are generated.For ber of ratings are typically new customers who have just example,a training set size of 20%means that 20%of the started to use the system.If we can provide good recom- records are randomly selected from the pool to form a train- mendation to them,we will have a higher chance to keep ing set.For each training set size,we randomly generate them as our long-term customers.Otherwise we will likely 10 different training sets based on which 10 experiments are lose them performed and the average result is reported. 4.3 Performance --20是Training In this section,we compare our method with PMF which 80号T工a1n1ng has been demonstrated to be one of the state-of-the-art CF methods 17.For fairness,we perform parameter tuning in advance for each method and then use the best settings found in all the experiments.For both methods,we initial- ize the latent features to random numbers in 0,1 and set the step size for gradient descent to 0.001.The parame- ters specific to our method are set as a 1 and B=50. Actually,we find that the performance will be stable af- ter about 1000 rounds of gradient decent (see Figure 3). Hence,we set W=1000 for all the following results.Fur- thermore,we adopt the Pearson similarity for all the experi- ments.The performance of other measures will be discussed -10 in Section 4.4.1. 11-20 161-320 3320 The results reported in Table 2 are the average MAE val- ues of PMF and TagiCoFi and their corresponding standard Figure 1:Performance improvement of TagiCoFi deviations.The better results are shown in bold.It iss clear over that of PMF on different user rating scales(no that TagiCoFi achieves better performance than PMF. users in a 20%training set have more than 320 ob- To evaluate how significant TagiCoFi outperforms PMF served ratings) we have conducted paired t-tests [26 on the results of PMF and TagiCoFi.Given two approaches,say A and B,and a set of n experiments,the MAE values are obtained for 4.4 Sensitivity to Parameters both approaches,denoted by ai and bi for i=1,2,...,n Let di ai-bi denote the difference of ai and bi and d 4.4.I User Similarity Measures be the average of the di values for i=1,2....,n.The null In this section,we conduct a set of experiments to compare hypothesis is d =0 whereas the alternative hypothesis is the effectiveness of the aforementioned user similarity mea- d>0.The p-value is computed using the t-statistic: sures:cosine similarity,Pearson similarity and Euclidean- d based similarity.Due to the page limit restriction,we only T= s/√元1 report results with parameters a =1,B=50,D=10 in Fig- ure 2.We have also observed the same trend in other param- where s is the standard deviation of d.A small p-value eter settings.From Figure 2,we see that the Pearson simi- (0.01)indicates the existence of statistically significant larity always gives the best performance and the Euclidean- evidence against the null hypothesis. based similarity is always the worst.Although the difference Table 3 shows the p-values obtained in our experiments. between these measures is obvious,Figure 2 shows that the It is easily observed that TagiCoFi significantly outperforms difference decreases as the training set size increases.One PMF.Because the main difference between TagiCoFi and may ask if changing the o parameter in the Euclidean-based PMF lies in the extra tagging information used by TagiCoFi, similarity measure will help.We have tuned the parameter we can conclude that the tagging information is very useful by trying different values but cannot make it outperform the and TagiCoFi can utilize it very effectively. other similarity measures.Based on this analysis,we adopt4.2 Evaluation Metric For consistency with experiments reported in the litera￾ture, we use the Mean Absolute Error (MAE) as evaluation metric. MAE gives the average absolute deviation of predic￾tion from the ground truth: MAE = P i P j Yij |Rij − Rˆij | P i P j Yij , where Rij and Rˆij are the true and predicted rating values, respectively. A smaller value of MAE indicates a better performance. In our experiments, we randomly split the rating records into two parts, each of which contains 50% of the observa￾tions in the rating matrix. One part is used as the test set, which is kept the same for all experiments. The other part is used as a pool from which training sets are generated. For example, a training set size of 20% means that 20% of the records are randomly selected from the pool to form a train￾ing set. For each training set size, we randomly generate 10 different training sets based on which 10 experiments are performed and the average result is reported. 4.3 Performance In this section, we compare our method with PMF which has been demonstrated to be one of the state-of-the-art CF methods [17]. For fairness, we perform parameter tuning in advance for each method and then use the best settings found in all the experiments. For both methods, we initial￾ize the latent features to random numbers in [0, 1] and set the step size for gradient descent to 0.001. The parame￾ters specific to our method are set as α = 1 and β = 50. Actually, we find that the performance will be stable af￾ter about 1000 rounds of gradient decent (see Figure 3). Hence, we set W = 1000 for all the following results. Fur￾thermore, we adopt the Pearson similarity for all the experi￾ments. The performance of other measures will be discussed in Section 4.4.1. The results reported in Table 2 are the average MAE val￾ues of PMF and TagiCoFi and their corresponding standard deviations. The better results are shown in bold. It iss clear that TagiCoFi achieves better performance than PMF. To evaluate how significant TagiCoFi outperforms PMF, we have conducted paired t-tests [26] on the results of PMF and TagiCoFi. Given two approaches, say A and B, and a set of n experiments, the MAE values are obtained for both approaches, denoted by ai and bi for i = 1, 2, . . . , n. Let di = ai − bi denote the difference of ai and bi and d¯ be the average of the di values for i = 1, 2, . . . , n. The null hypothesis is d¯ = 0 whereas the alternative hypothesis is d >¯ 0. The p-value is computed using the t-statistic: T = d¯ s/√ n , where s is the standard deviation of d. A small p-value (≤ 0.01) indicates the existence of statistically significant evidence against the null hypothesis. Table 3 shows the p-values obtained in our experiments. It is easily observed that TagiCoFi significantly outperforms PMF. Because the main difference between TagiCoFi and PMF lies in the extra tagging information used by TagiCoFi, we can conclude that the tagging information is very useful and TagiCoFi can utilize it very effectively. Table 3: p-values for the significance tests D = 5 D = 10 D = 20 20% Training 3.91 × 10−15 8.27 × 10−17 1.04 × 10−16 40% Training 4.11 × 10−13 2.10 × 10−16 4.52 × 10−16 60% Training 1.35 × 10−11 4.20 × 10−12 4.15 × 10−12 80% Training 1.24 × 10−8 2.85 × 10−12 5.99 × 10−12 In order to compare TagiCoFi with PMF more thoroughly, we compare their performance on users with different num￾bers of observed ratings. The results are shown in Figure 1, from which we can find that TagiCoFi outperforms PMF for all users and the improvement is more significant for users with only few observed ratings. This is a very promising property of TagiCoFi because those users with a small num￾ber of ratings are typically new customers who have just started to use the system. If we can provide good recom￾mendation to them, we will have a higher chance to keep them as our long-term customers. Otherwise we will likely lose them. 1−10 11−20 21−40 41−80 81−160 161−320 >320 0 0.05 0.1 Number of Observed Ratings MAE Improvement 20% Training 80% Training Figure 1: Performance improvement of TagiCoFi over that of PMF on different user rating scales (no users in a 20% training set have more than 320 ob￾served ratings) 4.4 Sensitivity to Parameters 4.4.1 User Similarity Measures In this section, we conduct a set of experiments to compare the effectiveness of the aforementioned user similarity mea￾sures: cosine similarity, Pearson similarity and Euclidean￾based similarity. Due to the page limit restriction, we only report results with parameters α = 1, β = 50, D = 10 in Fig￾ure 2. We have also observed the same trend in other param￾eter settings. From Figure 2, we see that the Pearson simi￾larity always gives the best performance and the Euclidean￾based similarity is always the worst. Although the difference between these measures is obvious, Figure 2 shows that the difference decreases as the training set size increases. One may ask if changing the σ parameter in the Euclidean-based similarity measure will help. We have tuned the parameter by trying different values but cannot make it outperform the other similarity measures. Based on this analysis, we adopt
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有