正在加载图片...
J Bobadilla et al/Knowledge-Based Systems 23(2010)520-528 12000 B 0.0 §8图8器图昌落器图昌。器8喜器喜舀器§邑昌8器 Jaccard 8芯器N品葛8器鵠吕 。。 Vaccaro ig. 4. Measurements related to the jaccard metric on MovieLens. (A)Number of pairs of users that display the jaccard values represented on the x axis. (B)Averaged MAE obtained in the pairs of users with the jaccard values represented on the x axis. (C) Averaged coverages obtained in the pairs of users with the jaccard values represented on he x axIs 115 B Jaccard"CPC 1000 00 0012001400 correlation and mean squared differences. (A)MAE, (B)Coverage. Mow IM, 20% of test users, 20% of test items, k E [2. 1500] step correlation, Spearman rank Fig. 5. MAE and coverage obtained with Pearson correlation and by combining jaccard with Pearson correlation, cosine, cons ommonly used due to its low capacity to produce new recom- The metric designed is based on two factors mendations MSD offers both a great advantage and a great disadvantage at The similarity between two users calculated as the mean of the the same time; the advantage is that it generates very good general squared differences(MSD): the smaller these differences, the results: low average error, high percentage of correct predictions greater the similarity between the 2 users. This part of the met and low percentage of incorrect predictions: the disadvantage is ric enables very good accuracy results to be obtained. that it has an intrinsic tendency to choose as similar users to one The number of items in which both one user and the other have given user those users who have rated a very small number of made a rating regarding the total number of items which have items [35. e.g. if we have 7 items that can be rated from 1 to 5 been rated between the two users. E.g. given users u1: ( 3. 2. and three users u1,u2,u3 with the following ratings:u1:(,·,4.4,··,·)andu2:(·,4.4,3,·,1) a common rating has been 5,·,·,·u2:(3,4,5,5,1,4,·)u3:(3.5.4,5.·,3,·)(· means made in two items as regards a joint rating of five items. This not rated item), the MSD metric will indicate that(u1, u3) have a to- factor enables us to greatly improve the metric's capacity to al similarity(o),(u1, u2) have a similarity 0.5 and(u2, u3)have a make predictions ower similarity(0.6). This situation is not convincing, as intuitively we realize u2 and u3 are very similar, whilst ul is only similar to u2 An important design aspect is the decision whether not to use a and u3 in 2 ratios, and, therefore, it is not logical to choose it as the parameter for which the value should be given arbitrarily,i.ethe most similar to them, and what is worse, if it is chosen it will not result provided by the metric should be obtained by only taking provide us with possibilities to recommend new items. he values of the ratings provided by the users of the rs. The strategy to follow to design the new metric is to consider By working on the 2 factors with standardized values [0.1 the metric obtained is as follows: Given the lists of ratings of 2 generic along the way its good behavior as regards accuracy and quality of users x and y(2)(2…,,(可…可)sthe the results number of items of our RS, where one of the possible values of eachcommonly used due to its low capacity to produce new recom￾mendations. MSD offers both a great advantage and a great disadvantage at the same time; the advantage is that it generates very good general results: low average error, high percentage of correct predictions and low percentage of incorrect predictions: the disadvantage is that it has an intrinsic tendency to choose as similar users to one given user those users who have rated a very small number of items [35], e.g. if we have 7 items that can be rated from 1 to 5 and three users u1, u2, u3 with the following ratings: u1: (, , 4, 5, , , ), u2: (3, 4, 5, 5, 1, 4, ), u3: (3, 5, 4, 5, , 3, ) ( means not rated item), the MSD metric will indicate that (u1,u3) have a to￾tal similarity (0), (u1,u2) have a similarity 0.5 and (u2,u3) have a lower similarity (0.6). This situation is not convincing, as intuitively we realize u2 and u3 are very similar, whilst u1 is only similar to u2 and u3 in 2 ratios, and, therefore, it is not logical to choose it as the most similar to them, and what is worse, if it is chosen it will not provide us with possibilities to recommend new items. The strategy to follow to design the new metric is to consider￾ably raise the capacity to generate MSD predictions, without losing along the way its good behavior as regards accuracy and quality of the results. The metric designed is based on two factors: The similarity between two users calculated as the mean of the squared differences (MSD): the smaller these differences, the greater the similarity between the 2 users. This part of the met￾ric enables very good accuracy results to be obtained. The number of items in which both one user and the other have made a rating regarding the total number of items which have been rated between the two users. E.g. given users u1: (3, 2, 4, , , ) and u2: (, 4, 4, 3, , 1), a common rating has been made in two items as regards a joint rating of five items. This factor enables us to greatly improve the metric’s capacity to make predictions. An important design aspect is the decision whether not to use a parameter for which the value should be given arbitrarily, i.e. the result provided by the metric should be obtained by only taking the values of the ratings provided by the users of the RS. By working on the 2 factors with standardized values [0..1], the metric obtained is as follows: Given the lists of ratings of 2 generic users x and y: rx;ry   : r1 x ;r2 x ;r3 x ; ... ; rI x  ; r1 y ;r2 y ;r3 y ; ; ... ; rI y   j I is the number of items of our RS, where one of the possible values of each Fig. 5. MAE and coverage obtained with Pearson correlation and by combining Jaccard with Pearson correlation, cosine, constrained Pearson’s correlation, Spearman rank correlation and mean squared differences. (A) MAE, (B) Coverage. MovieLens 1M, 20% of test users, 20% of test items, k e [2..1500] step 25. Fig. 4. Measurements related to the Jaccard metric on MovieLens. (A) Number of pairs of users that display the Jaccard values represented on the x axis. (B) Averaged MAE obtained in the pairs of users with the Jaccard values represented on the x axis. (C) Averaged coverages obtained in the pairs of users with the Jaccard values represented on the x axis. 524 J. Bobadilla et al. / Knowledge-Based Systems 23 (2010) 520–528
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有