正在加载图片...
J. Bobadilla et al/ Knowledge-Based Systems 24(2011)1310-1316 131 initially carrying out a clustering on all of the users, in such a way absolute difference between the ratings of both users hat a group of classes of similar users is obtained, after this, the (- 1. to the number of items rated by both users That esired Cf techniques can be applied to each of the clusters, obtain- ing similar results but in much shorter calculation times; these is to say, viy=a /b where b is the number of items rated by both cases use common genetic clustering algorithms such as GA-based users, and a is the number of items rated by both users over which K-means [17. the absolute difference in the ratings of both users is i. he Rs hybrid user models commonly use a combination of CF In this way, the vector vi.2 for the previous example is the fol- with demographic filtering or CF with content based filtering, to wing gust observe that there are only 5 items rated by both users exploit merits of each one of these techniques. In these cases, the 1 and 2) chromosome structure can easily contain the demographic charac- teristics and or those related to content-based filtering. 12=(1/5.1/52/5,1/5.0) he method proposed in the article uses GA, but with the advantage that it does not require the additional information pre All components of the vector v1.2 are divided into 5 since there are 5 vided by the hybrid user models, and therefore, it can be used in items rated by both users. The component vk.y represents the ratio of the current RS, simply based on CF techniques. This is due to the of items which both users rated identically to the number of items fact that our method only uses the rating of users(which is the ted by both users. In the previous example, vl2=1/5. since there least possible information in any RS). is only one item( the item 1) which both users have rated identically he following section defines the proposed method( GA-meth- (that is to say. i)-r2l=O). In the same way. vAy-m)represents ssing required using GA, after this, we pres- the ratio of items rated in opposite way by both users, to the num- nt the sections which specify the design of experiments carried ber of items rated by both users. In the previous exampl out and the results obtained, finally we list the most relevant con- v M-m)=v i=0/5=0, since there are no items rated with values clusions from this study 5 and 1 (the highest difference in ratings) by the users 1 and 2. In the example above, vRy=2/5. since there are exactly two items, item 2 and item 9. such that The fundamental objective is to improve the results of CF by obtaining a metric that improves the accuracy [12,7, 11, 8 of CF based RS. For this purpose, after a series of experiments carried 2.2. Similarity out on different metrics, different levels of sparsity [25,6] and dif- ferent databases, we have obtained an original, simple and efficient We will consider a family of similarity functions. For each ve approach which improves the usual results obtained in RS tor w=(wo),., M-m)whose components lie in the range [-1. 1](that is to say, wE[-1, 11), we will consider the following 21. Values We will consider a RS with a set of U users, (1,., U), and a set simw(x,y)=I w(vi of I items ( 1,..., I). Users rate those items they know with a dis crete range of possible values sents that the user keeps completely unsatisfied about an item, whose component, w), represents the importance of the compo- fied with an item. RS normally use m with value 1 and M with value to this similarity function). In this way, the similarity function asso- 5 or 10. In this way, the range of possible ratings is usually ciated to the vector w=(1.0.5.0. -0.5, -1)represents a similarity 1,5}or{1, function such that The ratings made by a particular user x can be represented by a vector,r=(", r 2).1) with dimension I( the number of Since w(o)=1. this similarity function evaluates highly posi- items in the RS )in such way that rf represents the rating that tively the number of items rated identically the user x has made over the item i Obviously, a user may not rate Since m(4)-1, this similarity function evaluates highly nega- lI the items in the RS. We will use the symbol to represent that a tively the number of items which have been rated in opposite user has not rated an item. Consequently, we use the expression rf=. to state that the user x has not rated the item i yet. Since w2)=0. this similarity function takes no consideration of Next, we will consider an example. We will consider that m= 1 those items over which the difference between the ratings and M=5 (that is to say the possible set of ratings is (1,...,5 ) made by both users is 2 there are nine items, I=9, and there are two users whose ratings Since w1)=0.5. this similarity function evaluates with a posi- re represented by the following rI and rz vectors: tive intermediate value those items over which the difference between the ratings made by both users is 1 r1=(4,5.·,3,2,·1,1,4) Since w3)=-05. this similarity function evaluates with a neg- ative intermediate value those items over which the difference between the ratings made by both users is 3. As may be seen, while the user 1 has rated the item 5, r=2, the user 2 has not rated the item 5 yet: r,=. The question related to obtain the most suitable similarity func- In order to compare both vectors, rx, ry, we can consider another tion represented by a vector w for a recommender system depends vector Day=(vA., AMym)whose dimension is the number of on the nature of the data in this recommender system. we will try the possible ratings that a user can make over an item. Each com- larity function which provides a minimal Mean Absolute Error ponent viy of the vector vy, represents the ratio of items, j, rated (MAE) in the recommender system. In order to obtain this similar- by both users( that is to say, rf. and rl+)and over which the ity function, we will use a genetic algorithm.initially carrying out a clustering on all of the users, in such a way that a group of classes of similar users is obtained, after this, the desired CF techniques can be applied to each of the clusters, obtain￾ing similar results but in much shorter calculation times; these cases use common genetic clustering algorithms such as GA-based K-means [17]. The RS hybrid user models commonly use a combination of CF with demographic filtering or CF with content based filtering, to exploit merits of each one of these techniques. In these cases, the chromosome structure can easily contain the demographic charac￾teristics and/or those related to content-based filtering. The method proposed in the article uses GA, but with the advantage that it does not require the additional information pro￾vided by the hybrid user models, and therefore, it can be used in all of the current RS, simply based on CF techniques. This is due to the fact that our method only uses the rating of users (which is the least possible information in any RS). The following section defines the proposed method (GA-meth￾od) and the prior processing required using GA, after this, we pres￾ent the sections which specify the design of experiments carried out and the results obtained, finally we list the most relevant con￾clusions from this study. 2. Design of the new similarity method The fundamental objective is to improve the results of CF by obtaining a metric that improves the accuracy [12,7,11,8] of CF based RS. For this purpose, after a series of experiments carried out on different metrics, different levels of sparsity [25,6] and dif￾ferent databases, we have obtained an original, simple and efficient approach which improves the usual results obtained in RS. 2.1. Values We will consider a RS with a set of U users, {1, ... , U}, and a set of I items {1, ... , I}. Users rate those items they know with a dis￾crete range of possible values {m, ... , M} where m usually repre￾sents that the user keeps completely unsatisfied about an item, and a value M usually represents that the user is completely satis- fied with an item. RS normally use m with value 1 and M with value 5 or 10. In this way, the range of possible ratings is usually {1, ... , 5} or {1, ... , 10}. The ratings made by a particular user x can be represented by a vector, rx ¼ r ð1Þ x ;r ð2Þ x ... ;r ðIÞ x with dimension I (the number of items in the RS) in such way that ri x represents the rating that the user x has made over the item i. Obviously, a user may not rate all the items in the RS. We will use the symbol to represent that a user has not rated an item. Consequently, we use the expression r ðiÞ x ¼ to state that the user x has not rated the item i yet. Next, we will consider an example. We will consider that m = 1 and M = 5 (that is to say the possible set of ratings is {1, ... , 5}), there are nine items, I = 9, and there are two users whose ratings are represented by the following r1 and r2 vectors: r1 ¼ ð4; 5; ; 3; 2; ; 1; 1; 4Þ r2 ¼ ð4; 3; 1; 2; ; 3; 4; ; 2Þ As may be seen, while the user 1 has rated the item 5, r5 1 ¼ 2, the user 2 has not rated the item 5 yet: r5 2 ¼ . In order to compare both vectors, rx, ry, we can consider another vector v x;y ¼ vð0Þ x;y ; ... ; vðMmÞ x;y whose dimension is the number of the possible ratings that a user can make over an item. Each com￾ponent vðiÞ x;y of the vector vx,y, represents the ratio of items, j, rated by both users (that is to say, r ðjÞ x – and r ðjÞ y –) and over which the absolute difference between the ratings of both users is i rj x r j y       ¼ i , to the number of items rated by both users. That is to say, vðiÞ x;y ¼ a=b where b is the number of items rated by both users, and a is the number of items rated by both users over which the absolute difference in the ratings of both users is i. In this way, the vector v1,2 for the previous example is the fol￾lowing (just observe that there are only 5 items rated by both users 1 and 2): v1;2 ¼ ð1=5; 1=5; 2=5; 1=5; 0Þ All components of the vector v1,2 are divided into 5 since there are 5 items rated by both users. The component vð0Þ x;y represents the ratio of items which both users rated identically to the number of items rated by both users. In the previous example, vð0Þ 1;2 ¼ 1=5, since there is only one item (the item 1) which both users have rated identically (that is to say, r ð1Þ 1 r ð1Þ 2       ¼ 0). In the same way, vðMmÞ x;y represents the ratio of items rated in opposite way by both users, to the num￾ber of items rated by both users. In the previous example, vðMmÞ 1;2 ¼ v4 1;2 ¼ 0=5 ¼ 0, since there are no items rated with values 5 and 1 (the highest difference in ratings) by the users 1 and 2. In the example above, vð2Þ x;y ¼ 2=5, since there are exactly two items, item 2 and item 9, such that r ð2Þ 1 r ð2Þ 2       ¼ r ð9Þ 1 r ð9Þ 2       ¼ 2. 2.2. Similarity We will consider a family of similarity functions. For each vec￾tor w = (w(0), ... ,w(Mm) ) whose components lie in the range [1, 1] (that is to say, wðiÞ 2 ½1; 1Þ, we will consider the following similarity function: simwðx; yÞ ¼ 1 M m þ 1 M Xm i¼0 wðiÞ vðiÞ x;y ð1Þ Consequently, for each vector, w, there is a similarity function whose component, w(i) , represents the importance of the compo￾nent vðiÞ x;y for calculating the similarity between two users (according to this similarity function). In this way, the similarity function asso￾ciated to the vector w = (1, 0.5, 0, 0.5, 1) represents a similarity function such that: Since w(0) = 1, this similarity function evaluates highly posi￾tively the number of items rated identically. Since w(4) = 1, this similarity function evaluates highly nega￾tively the number of items which have been rated in opposite way (wð0Þ x ¼ 1). Since w(2) = 0, this similarity function takes no consideration of those items over which the difference between the ratings made by both users is 2. Since w(1) = 0.5, this similarity function evaluates with a posi￾tive intermediate value those items over which the difference between the ratings made by both users is 1. Since w(3) = 0.5, this similarity function evaluates with a neg￾ative intermediate value those items over which the difference between the ratings made by both users is 3. The question related to obtain the most suitable similarity func￾tion represented by a vector w for a recommender system depends on the nature of the data in this recommender system. We will try to find, among all the possible vectors w, one representing a simi￾larity function which provides a minimal Mean Absolute Error (MAE) in the recommender system. In order to obtain this similar￾ity function, we will use a genetic algorithm. J. Bobadilla et al. / Knowledge-Based Systems 24 (2011) 1310–1316 1311
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有