正在加载图片...
publications and reviews also exist which include the most com- sim(x, y)=- 2i(rzi-rmed)(ryi-Tme measures: mean absolute error, coverage precision, recall and derivatives of these: mean squared error, normalized mean absolute Tmed: median value in the rating scale. error, ROC and fallout: Goldberg et al. [13 focuses on the aspects not related to the evaluation, Breese et al. [6 compare the predictive rankxi-rankx)(rank accuracy of various methods in a set of representative problem domains. Candillier et al. [7] and Schafer et al. [36] review the mai rank collaborative filtering methods proposed in the literature The rest of the paper is structured as follows: Although Pearson correlation is the most commonly used met ric in the process of memory-based CF (user to user). this choice is In Section 2 we provide the basis for the principles on which the not always backed by the nature and distribution of the data in the esign of the new metric will be based, we present graphs RS. Formally, in order to be able to apply this metric with guaran which show the way in which the users vote, we carry out tees, the following assumptions must be met experiments which support the decisions made, we establish the best way of selecting numerical and non-numerical infor-. Linear relationship between x and y mation from the votes and, finally, we establish the hypothesis Continuous random variables. on which the paper and its proposed metric are based. mally distributed In section 3 we establish the mathematical formulation of the metric These conditions are not normally met in real RS, and Pearson In Sections 4 and 5, respectively, we list the experiments that correlation presents some significant cases of erroneous operation ill be carried out and we present and discuss the results that should not be ignored in Rs. Despite the deficiencies of Pearson correlation, this similarity Section 6 presents the most relevant conclusions of the measure presents the best prediction and recommendation results in CF-based RS[15, 16, 31,7,35, furthermore, it is the most co monly used and therefore any alternative metric proposed must 2. Approach and design of the new similarity metric its results On accepting that Pearson correlation is the metric for which 2.1. Introduction the results must be improved, but not necessarily the most appro- riate to be taken as a base. it is advisable to focus on the informa- Collaborative filtering methods work on a table of U users who tion that is obtained in the different research processes and which an rate I items. The prediction of a non-rated item i for a user u is can sometimes be overlooked when searching for other different computed as an aggregate of the ratings of the k most similar users objectives to improving the accuracy of rs(cold-start problem, (k-neighborhoods) for the same item i, where Ku denotes the set of trust and novelty, sparsity, etc. ) k-neighborhoods of u and rni denotes of value of the user n rating The simplest information to give us an idea of the nature of the rs on the item i (o if there is not rating value is to find out what users usually vote: do they always tend to vote for Once the set of K users(neighborhoods)similar to active u has the same values? Do they always tend to vote for the same items? Is been calculated, in order to obtain the prediction of item i on user there much difference between the votes of some users and others? u, one of the following aggregation approaches is often used: the average(2), the weighted sum (3)and the adjusted weighted and NetFlix RS (where you can vote in the interval [1.51).We can aggregation(deviation-from-mean)(4). We will use the auxiliar see how, on average, the users focus their votes on the higher levels set Gui in order to define Eqs. (2)-(5): of the interval, but avoiding the extremes, particularly the lower extremes. The distribution of the votes is not balanced and partic Gu={n∈KBn≠·} (1) ularly negative or particularly positive votes are avoided. ∑mCn≠ (2) of the votes cast in the MovieLens 1m and NetFlix databases Graphs(A)and ( b)of Fig. 2 show the number of items that display pa={u∑sim(u,n)asGu≠, (3) the arithmetic average specified on the x axis; we can see that there are hardly any items rated, on average, below 2 or above 4, pui=lu+ sim(u,n)(rni-n)Gu≠② (4) whereby most of the cases are between the values 3 and 4 Graphs C) and(D)of Fig. 2 show the number of items that display the standard deviation specified on the x axis; we can see that most where u serves as a normalizing factor, usually computed of the items have been voted by the users, on average with a max- imum difference of 1.2 votes =1/∑sim(u.n)Cu≠ According to the figures analyzed, we find that traditional met- rics must often achieve results by operating on a set of discrete rat- The most popular similarity metrics are Pearson correlation(6), Ings with very little variation(majority of votes between 3 and 4) cosine (7), constrained Pearsons correlation(8)and spearma and with the obligation of improving simpler and quicker estima- rank correlation ( 9): tions, such as always predicting with the value of the arithmetic average of the votes of each item (in which we know there is sel- dom a standard deviation higher than 1.2 sim(x,y)= ∑(rx-Fx)(ry-F,) 2(x-F1)2(y- 2.2. Basic experimentation sim(x,y)= After ascertaining the low diversity of the votes cast by the users, it seems reasonable to consider that the votes mainly tendpublications and reviews also exist which include the most com￾monly accepted metrics, aggregation approaches and evaluation measures: mean absolute error, coverage, precision, recall and derivatives of these: mean squared error, normalized mean absolute error, ROC and fallout; Goldberg et al.[13]focuses on the aspects not related to the evaluation, Breese et al. [6] compare the predictive accuracy of various methods in a set of representative problem domains. Candillier et al. [7] and Schafer et al. [36] review the main collaborative filtering methods proposed in the literature. The rest of the paper is structured as follows: In Section 2 we provide the basis for the principles on which the design of the new metric will be based, we present graphs which show the way in which the users vote, we carry out experiments which support the decisions made, we establish the best way of selecting numerical and non-numerical infor￾mation from the votes and, finally, we establish the hypothesis on which the paper and its proposed metric are based. In Section 3 we establish the mathematical formulation of the metric. In Sections 4 and 5, respectively, we list the experiments that will be carried out and we present and discuss the results obtained. Section 6 presents the most relevant conclusions of the publication. 2. Approach and design of the new similarity metric 2.1. Introduction Collaborative filtering methods work on a table of U users who can rate I items. The prediction of a non-rated item i for a user u is computed as an aggregate of the ratings of the K most similar users (k-neighborhoods) for the same item i, where Ku denotes the set of k-neighborhoods of u and rn,i denotes of value of the user n rating on the item i ( if there is not rating value). Once the set of K users (neighborhoods) similar to active u has been calculated, in order to obtain the prediction of item i on user u, one of the following aggregation approaches is often used: the average (2), the weighted sum (3) and the adjusted weighted aggregation (deviation-from-mean) (4). We will use the auxiliar set Gu,i in order to define Eqs. (2)–(5): Gu;i ¼ n 2 Kuj9rn;i – ; ð1Þ pu;i ¼ 1 #Gu;i X n2Gu;i rn;i () Gu;i – £; ð2Þ pu;i ¼ lu;i X n2Gu;i simð Þ u; n rn;i () Gu;i – £; ð3Þ pu;i ¼ ru þ lu;i X n2Gu;i simð Þ u; n rn;i rn   () Gu;i – £; ð4Þ where l serves as a normalizing factor, usually computed: lu;i ¼ 1 X n2Gu;i simðu; nÞ () Gu;i – £ , : ð5Þ The most popular similarity metrics are Pearson correlation (6), cosine (7), constrained Pearson’s correlation (8) and Spearman rank correlation (9): simð Þ¼ x; y P i rx;i rx   ry;i ry   ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i rx;i rx  2P i ry;i ry  2 q ; ð6Þ simð Þ¼ x; y P i rx;iry;i ffiffiffiffiffiffiffiffiffiffiffiffi P i r2 x;i q ffiffiffiffiffiffiffiffiffiffiffiffi P i r2 y;i q ; ð7Þ simð Þ¼ x; y P i rx;i rmed   ry;i rmed   ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i rx;i rmed  2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i ry;i rmed  2 q ; rmed : median value in the rating scale; ð8Þ simð Þ¼ x; y P i rankx;i rankx   ranky;i ranky   ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i rankx;i rankx  2P i ranky;i ranky  2 r : ð9Þ Although Pearson correlation is the most commonly used met￾ric in the process of memory-based CF (user to user), this choice is not always backed by the nature and distribution of the data in the RS. Formally, in order to be able to apply this metric with guaran￾tees, the following assumptions must be met: Linear relationship between x and y. Continuous random variables. Both variables must be normally distributed. These conditions are not normally met in real RS, and Pearson correlation presents some significant cases of erroneous operation that should not be ignored in RS. Despite the deficiencies of Pearson correlation, this similarity measure presents the best prediction and recommendation results in CF-based RS [15,16,31,7,35], furthermore, it is the most com￾monly used, and therefore, any alternative metric proposed must improve its results. On accepting that Pearson correlation is the metric for which the results must be improved, but not necessarily the most appro￾priate to be taken as a base, it is advisable to focus on the informa￾tion that is obtained in the different research processes and which can sometimes be overlooked when searching for other different objectives to improving the accuracy of RS (cold-start problem, trust and novelty, sparsity, etc.). The simplest information to give us an idea of the nature of the RS is to find out what users usually vote: do they always tend to vote for the same values? Do they always tend to vote for the same items? Is there much difference between the votes of some users and others? Fig. 1 shows the distribution of the votes cast in MovieLens 1M and NetFlix RS (where you can vote in the interval [1..5]). We can see how, on average, the users focus their votes on the higher levels of the interval, but avoiding the extremes, particularly the lower extremes. The distribution of the votes is not balanced and partic￾ularly negative or particularly positive votes are avoided. Fig. 2 shows the arithmetic average and the standard deviation of the votes cast in the MovieLens 1M and NetFlix databases. Graphs (A) and (B) of Fig. 2 show the number of items that display the arithmetic average specified on the x axis; we can see that there are hardly any items rated, on average, below 2 or above 4, whereby most of the cases are between the values 3 and 4. Graphs (C) and (D) of Fig. 2 show the number of items that display the standard deviation specified on the x axis; we can see that most of the items have been voted by the users, on average, with a max￾imum difference of 1.2 votes. According to the figures analyzed, we find that traditional met￾rics must often achieve results by operating on a set of discrete rat￾ings with very little variation (majority of votes between 3 and 4) and with the obligation of improving simpler and quicker estima￾tions, such as always predicting with the value of the arithmetic average of the votes of each item (in which we know there is sel￾dom a standard deviation higher than 1.2). 2.2. Basic experimentation After ascertaining the low diversity of the votes cast by the users, it seems reasonable to consider that the votes mainly tend J. Bobadilla et al. / Knowledge-Based Systems 23 (2010) 520–528 521
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有