正在加载图片...
J Bobadilla et al /Expert Systems with Applications 38(2011)14609-14623 14613 2.5. Obtaining the mean absolute error-accuracy Table 8 Mean absolute errors of each user(mu)and of the system(m)using K=3 In order to measure the accuracy of the results of an RS, it is usual to use the calculation of some of the (3.5+1+3+0.5)/4=2 metrics, amongst which the mean absolute (167+134+0+0+0)/5=0.6 related metrics, mean squared error, root and normalized mean absolute error stand out (0.76+2+06+0.58+0.75)5=0.938 Let Ou={∈|u≠·Nu≠· Fig. 2 shows the MAE results obtained on MovieLens 1M using fine the mae of a user u as various similarity measures and two aggregations approache commonly used in CF(Eqs.(16)and (17). The calculations have pu-r叫却O≠必 been made in the range K=2 to K= 1500, by averaging their results: as we can see, the lowest error values are obtained using (27) Pearson Correlation(PC), particularly when Deviation From Mean (DFM)is used as the aggregation approach. These results lead us 2. The MAE of the RS can be obtained as the average of the user's to use PC-DFM as the reference combination which acts as a way Leto={u∈Um≠· (28) although it still needs to be tested with standardization methods. analysis of its coverage quality of recommendations, etc. We define the systems MAE as When selecting a similarity measure we must take into account that the averaged results may lead to a false idea of the integrity of m=0∑m台0≠必 (29) the real results, as can be seen in Fig. 3 where we can notice that, although PC-DFM presents a lower global MAE, when we use val- (30) ues of K-neighbors under 350 (which is quite common). CPC-WS offers better error measures this situation must be considered in The accuracy is defined as the inverse of the error (1/ m), the accuracy analysis obtained in the RS but more specifically it can be established as: accuracy 1-momm, accuracy E[0, 1. 2.6. Standardization process 2.5.2. Running example 2.6.1. Introduction Cable 8 shows the mean absolute errors of each user(mu)and of When using CF, at times it maybe a good idea to carry out a data the system(m)using K=3 standardization process. The z-scores, or normal scores distribute a group of data in line with a normal distribution Often, the systems MAE is implemented in such a way that < x 2.5.3. Case of study when there are no neighbors capable of making a prediction on where x is a raw score to be standardized, u is the mean of the pop- n item, the average for that item of all the training users(except ulation and o is the standard deviation of the population. z is neg- the active user) is used as the prediction. this behavior is reflected ative when the raw score is below the mean and positive when Eqs. (19)-(23) as opposed to Eqs. (14)-(18) which are used above. when there is at least one neighbor capable of making a prediction Although the most obvious application is the standardization of n the item considered Fig. 1 shows the result obtained using both the users' votes (of the input values), it is also possible to apply this approaches applied to Pearson Correlation and making use of the process to improve the predictions: the similarity values sim(u, n) average aggregation approaches(15),(20). Database: MovieLens obtained by applying the selected similarity measure are used to weight the importance of the votes of each K-neighbors(Eqs In graphs 1a(computed using Eq (15))and Ic(computed using (16 -(18). In some cases, most of the neighbors show very high Eq(20)), a horizontal line appears at 0. 797 which indicates the similarity values, and therefore, the weighting process loses effec- value of the mae obtained using K= all the training users. Fig. 1c tiveness; in these cases it is effective to make use of z-scores to bet ows values that tend towards this limit when low values of k ter differentiate the contribution that each neighbor will have in are selected, due to the fact that the lower the value of K the fewer the prediction results. the neighbors available in order to rate the items that the active user has voted for and therefore the greater probability of having 2.6. 2 Case of study to make use of the votes of all the training users of the rs in order g. 4 shows the result of applying z-scores to the input data or to make a prediction; in this case, when the mae increases, the to the similarity values sim(u, n). Except in the case of cosine, which prediction capacity(coverage)decreases drasticallygraph 1b). is greatly improved by applying z-scores to the input data, no Predictions that each user can receive using 3-neighbors 2:22 4.33 4.33 4.332.5. Obtaining the mean absolute error-accuracy 2.5.1. Formalization In order to measure the accuracy of the results of an RS, it is usual to use the calculation of some of the most common error metrics, amongst which the mean absolute error (MAE) and its related metrics, mean squared error, root mean squared error, and normalized mean absolute error stand out. Let Ou ¼ fi 2 Ijpu;i – ^ru;i – g ð25Þ We define the MAE of a user u as: mu ¼ 1 #Ou X i2Ou jpu;i ru;ij () Ou – £ ð26Þ mu ¼() Ou ¼ £ ð27Þ The MAE of the RS can be obtained as the average of the user’s MAE: Let O ¼ fu 2 Ujmu – g ð28Þ We define the system’s MAE as: m ¼ 1 #O X u2O mu () O – £ ð29Þ m ¼() O ¼ £ ð30Þ The accuracy is defined as the inverse of the error (1/m), but more specifically it can be established as: accuracy ¼ 1 m maxmin ; accuracy 2 ½0; 1. 2.5.2. Running example Table 8 shows the mean absolute errors of each user (mu) and of the system (m) using K = 3. 2.5.3. Case of study Often, the system’s MAE is implemented in such a way that when there are no neighbors capable of making a prediction on an item, the average for that item of all the training users (except the active user) is used as the prediction. This behavior is reflected in Eqs. (19)–(23), as opposed to Eqs. (14)–(18) which are used when there is at least one neighbor capable of making a prediction on the item considered. Fig. 1 shows the result obtained using both approaches applied to Pearson Correlation and making use of the average aggregation approaches (15), (20). Database: MovieLens 1M. In graphs 1a (computed using Eq. (15)) and 1c (computed using Eq. (20)), a horizontal line appears at 0.797 which indicates the value of the MAE obtained using K = all the training users. Fig. 1c shows values that tend towards this limit when low values of K are selected, due to the fact that the lower the value of K the fewer the neighbors available in order to rate the items that the active user has voted for and therefore, the greater probability of having to make use of the votes of all the training users of the RS in order to make a prediction; in this case, when the MAE increases, the prediction capacity (coverage) decreases drastically (graph 1b). Fig. 2 shows the MAE results obtained on MovieLens 1M using various similarity measures and two aggregations approaches commonly used in CF (Eqs. (16) and (17)). The calculations have been made in the range K = 2 to K = 1500, by averaging their results; as we can see, the lowest error values are obtained using Pearson Correlation (PC), particularly when Deviation From Mean (DFM) is used as the aggregation approach. These results lead us to use PC-DFM as the reference combination which acts as a way of testing future metrics proposed by the scientific community, although it still needs to be tested with standardization methods, analysis of its coverage, quality of recommendations, etc. When selecting a similarity measure we must take into account that the averaged results may lead to a false idea of the integrity of the real results, as can be seen in Fig. 3 where we can notice that, although PC-DFM presents a lower global MAE, when we use val￾ues of K-neighbors under 350 (which is quite common), CPC-WS offers better error measures. This situation must be considered in the accuracy analysis obtained in the RS. 2.6. Standardization process 2.6.1. Introduction When using CF, at times it maybe a good idea to carry out a data standardization process. The z-scores, or normal scores distribute a group of data in line with a normal distribution. z ¼ x l r ð31Þ where x is a raw score to be standardized, l is the mean of the pop￾ulation and r is the standard deviation of the population. z is neg￾ative when the raw score is below the mean and positive when above. Although the most obvious application is the standardization of the users’ votes (of the input values), it is also possible to apply this process to improve the predictions: the similarity values sim(u,n) obtained by applying the selected similarity measure are used to weight the importance of the votes of each K-neighbors (Eqs. (16)–(18)). In some cases, most of the neighbors show very high similarity values, and therefore, the weighting process loses effec￾tiveness; in these cases it is effective to make use of z-scores to bet￾ter differentiate the contribution that each neighbor will have in the prediction results. 2.6.2. Case of study Fig. 4 shows the result of applying z-scores to the input data or to the similarity values sim(u,n). Except in the case of cosine, which is greatly improved by applying z-scores to the input data, no Table 7 Predictions that each user can receive using 3-neighbors. Pu,i I1 I2 I3 I4 I5 I6 I7 I8 I9 I10 I11 I12 I13 I14 U1 4.5 2 3.5 3 3 4.66 4.33 4.5 U2 4.5 3 4 2 3 4.5 4.33 2 4.5 U3 3.33 2.66 4 2.5 1 5 4 24 1 U4 5 2 3.5 4 2 3 4.5 4.33 2 4.33 U5 3.33 2 3 41 35 4 4 1 Table 8 Mean absolute errors of each user (mu) and of the system (m) using K = 3. mu U1 (0.5 + 0.5 + 2 + 0.33 + 0.5)/5 = 0.76 U2 (3.5 + 1 + 3 + 0.5)/4 = 2 U3 (1.67 + 1.34 + 0 + 0 + 0)/5 = 0.6 U4 (1 + 0.5 + 0.5 + 0.33)/4 = 0.58 U5 (0 + 1 + 1 + 1)/ 4 = 0.75 m (0.76 + 2 + 0.6 + 0.58 + 0.75)/5 = 0.938 J. Bobadilla et al. / Expert Systems with Applications 38 (2011) 14609–14623 14613
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有