正在加载图片...
J Bobadilla et aL/Knowledge-Based Systems 23(2010)520-528 New metric - Pearson correhtion co melation B °裂导器器昌昌昌自目昌昌昌昌爵 §昌艮总器員§昌垦昌目量昌 New metric - Pearson corelation New metric - Pearson corre latio D 0.45 0.35 0.25 9爵昌昌昌昌昌昌昌國昌昌昌爵 Fig 8. Correlation and new metric comparative results using FilmAffinity: (A)accuracy, (B)coverage, (C) percentage of perfect predictions, (D) precision/recall. 20/ of test sers, 20% of test items, k E[2.2000 step 100, NE2-20. 6=5 precision and recall we are using the concept of relevant recom- Tal endations(determined by the threshold 0=5 in our experiment) uality of the results obtained by applying the quality measurements on the selected Based on an improvement in the MAE of 0. 2 stars, we will be capa ble, on many occasions, of suitably classifying the items which ob- NetFlix tained a prediction of 4.3-4.49(considered irrelevant) with Not applicable Pearson correlation and which, with the new metric, we will place, ften correctly, above 4.5, and therefore, we will consider them Coverage relevant As we can see, in the recommendation quality measurements we are dealing with more restrictive numerical margins than those which use the prediction quality measurements, and therefore, it is not adapt to this standard, primarily because the users can vote in advisable to consider all kinds of improvements positively a range from 1 to 10 stars, and to a lesser extent because certain n short, using the MovieLens database, the proposed metric im- itineraries"exist in order to vote for films established by the man- proves the prediction quality measures and the coverage. The rec- agers of the rS which have become more popularly used and which ommendation quality measures are slightly improved make jaccard's contribution to the metric less effectiv The MAE and perfect predictions(graphs 7A and 7C)results ob- Table 3 summarizes the most relevant aspects of the results lined with NetFlix are similar(although slightly lower) to those of obtained. MovieLens. Nevertheless, the coverage drops using NetFlix(graph Experimentally, the proposed metric, is executed in 7B); this behavior is logical and is to be expected, as the proposed mately a third of the time required by pearson correlation. Indeed metric is capable of finding more similar neighbors(which im- Pearson correlation requires 2 subtraction operations and 3 multi- prove the measure of accuracy ): he more similar the neighbors plication-addition operations(assimilating the squaring operation are to the test user, in general, not only will they have more similar with the multiplication)for every pair of items voted in common ote values, but also they will have a greater tendency to vote for(inside the summations ), whilst the MSd only requires one sub- the same subset of the total films rated( the same genres, in the traction operation and one multiplication-addition operation. Out same years, etc. ) although this factor has been alleviated by the side the summation(and therefore less significant in the average use of Jaccard its impact has not been completely eliminated. times), Pearson correlation requires a square-root calculation and The NetFlix precision/recall quality measure(graph 7D)im- the proposed metric only requires 2 divisions and one multiplica proves significantly when the number of recommendations is not tion. Finally, the time required for the Jaccard calculation has pro- high(between 2 and 5), remaining similar to Pearson correlation ven to have very little relevance as the values are obtained nmendations is high. efficiently in the same process(loop)used in the summations. Fig 8 shows the results obtained using the FilmAffinity data- base. In summary, the new metric offers results which cannot be considered better than those provided by Pearson correlation. 6. Conclusions These results do not negate the integrity of the proposed metric, but rather they confirm its design principles and restrict its field of appl aton Part of the design of the proposed metric is based on the low le- ratings as"positiveornon-positive, transferring those concep- vel of variety and on the sir ilarity of the numerical votes cast by tual lities( in the study cases from 1 to 5 stars). FilmAffinity does of each item by the set of users of the recommender systems s the different users of the RS which offer a small range of voting formity in the range of votes cast and little variation in the ratingprecision and recall we are using the concept of relevant recom￾mendations (determined by the threshold h = 5 in our experiment). Based on an improvement in the MAE of 0.2 stars, we will be capa￾ble, on many occasions, of suitably classifying the items which ob￾tained a prediction of 4.3–4.49 (considered irrelevant) with Pearson correlation and which, with the new metric, we will place, often correctly, above 4.5, and therefore, we will consider them relevant. As we can see, in the recommendation quality measurements we are dealing with more restrictive numerical margins than those which use the prediction quality measurements, and therefore, it is advisable to consider all kinds of improvements positively. In short, using the MovieLens database, the proposed metric im￾proves the prediction quality measures and the coverage. The rec￾ommendation quality measures are slightly improved. The MAE and perfect predictions (graphs 7A and 7C) results ob￾tained with NetFlix are similar (although slightly lower) to those of MovieLens. Nevertheless, the coverage drops using NetFlix (graph 7B); this behavior is logical and is to be expected, as the proposed metric is capable of finding more similar neighbors (which im￾prove the measure of accuracy); the more similar the neighbors are to the test user, in general, not only will they have more similar vote values, but also they will have a greater tendency to vote for the same subset of the total films rated (the same genres, in the same years, etc.); although this factor has been alleviated by the use of Jaccard, its impact has not been completely eliminated. The NetFlix precision/recall quality measure (graph 7D) im￾proves significantly when the number of recommendations is not high (between 2 and 5), remaining similar to Pearson correlation when the number of recommendations is high. Fig. 8 shows the results obtained using the FilmAffinity data￾base. In summary, the new metric offers results which cannot be considered better than those provided by Pearson correlation. These results do not negate the integrity of the proposed metric, but rather they confirm its design principles and restrict its field of application. Part of the design of the proposed metric is based on the low le￾vel of variety and on the similarity of the numerical votes cast by the different users of the RS which offer a small range of voting possibilities (in the study cases from 1 to 5 stars). FilmAffinity does not adapt to this standard, primarily because the users can vote in a range from 1 to 10 stars, and to a lesser extent because certain ‘‘itineraries” exist in order to vote for films established by the man￾agers of the RS which have become more popularly used and which make Jaccard’s contribution to the metric less effective. Table 3 summarizes the most relevant aspects of the results obtained. Experimentally, the proposed metric, is executed in approxi￾mately a third of the time required by Pearson correlation. Indeed, Pearson correlation requires 2 subtraction operations and 3 multi￾plication-addition operations (assimilating the squaring operation with the multiplication) for every pair of items voted in common (inside the summations), whilst the MSD only requires one sub￾traction operation and one multiplication-addition operation. Out￾side the summation (and therefore less significant in the average times), Pearson correlation requires a square-root calculation and the proposed metric only requires 2 divisions and one multiplica￾tion. Finally, the time required for the Jaccard calculation has pro￾ven to have very little relevance, as the values are obtained efficiently in the same process (loop) used in the summations. 6. Conclusions In recommender systems which base the votes on small ranges of values (e.g. from 1 to 5), it occurs that users tend to cast their ratings as ‘‘positive” or ‘‘non-positive”, transferring those concep￾tual levels towards numerical values, which is reflected in low uni￾formity in the range of votes cast and little variation in the ratings of each item by the set of users of the recommender systems. Fig. 8. Correlation and new metric comparative results using FilmAffinity: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 20% of test users, 20% of test items, k e [2..2000] step 100, N e [2..20], h = 5. Table 3 Quality of the results obtained by applying the quality measurements on the selected databases. MovieLens NetFlix FilmAffinity MAE ++ ++ Not applicable Perfect predictions ++ ++ Coverage + Precision/recall + + J. Bobadilla et al. / Knowledge-Based Systems 23 (2010) 520–528 527
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有