3. Genetic algorithm In order to find_中国高校课件下载中心

点击下载：《电子商务 E-business》阅读文献：Improving collaborative filtering recommender system results and performance using genetic algorithms

正在加载图片...

3. Genetic algorithm The 50% of the initial population is seeded in areas where opti In order to find an optimal similarity function, Vo measures I genetic algorithm to find the vector w associated to the optimal weights the number of items rated identically by two users. similarity function simw. Consequently we can reasonably expect that an optimal simi- e use a supervised learning task( Goldberg. 1989)whose fit- larity function will have a high positive value of wo). In this ness function is the Mean Absolute Error MAE of the RS. In this way, the value wo)of individuals will be chosen randomly to ay the population of our genetic algorithm is the set of different lie within a high positive range of values. In the same way. vectors of weights, w. Each individual, that is to say each vector of he value wM-m)w4) for Movielens and w9)for FilmAffinity ty measure(see(1)),for measures how the similarity function weights the number of which we will evaluate the mae of the RS using this similarity items rated in a totally opposite way by two users. Cons neasure. while running our genetic algorithm, the successive pop quently we can reasonably expect that an optimal similarity ulation generations tend to improve the mae in the rs. Our genetic function will have a high negative value of wfM-m) In this algorithm stops generating populations when MaE in the rs for a way, the value wfM-m) of individuals will be chosen randomly vector of weight is lower than a threshold, ,(which has been pre- to lie within a high negative range of values. viously established). Next, we will show the ranges used to seed the 50% of the initial usual, we use only a part of the whole recommender system (training users and training items)in order to obtain an With the Movielens and Netflix Recommender System(m=1 similarity function. Once obtained this similarity function. nd M=5) ry out our experiments to evaluate the similarity function by using only the test users and the test items(that is to say, those wo E[1,0.6]. W1 E[0.6, 0. 2), W2 E(0. 2, -0.21 users and items which have not been used in the GA algorithm). w3 E1-0.2, -0.6]. W4 E [-0.6, -1 3. 1. Genetic representation With the FilmAffinity Recommender System (m=1 and As usual, individuals of populations are represented in binary wo E[1,0.8). W1 E(0.8,0.6]w2 E(0.6.0.4) form as strings of Os and 1s. As we have previously stated, in our ge- w3 E(0.4,0.2), WA E(0.2, 0 ). ws E[0, -0.21 vector is a possible individual of the population. Each component W6 E[-0.2, -0.4. W7E1-0.4, -0.6] wo in the vector of w will be represented by 10 bits. Consequently, Ws E1-0.6, -0.8, Wg E[-08, -1 the vector w=(wo)., wM-m))will be represented by the follow- ing string of Os and 1s (with a length of 210(M-m*1)bits) b-m…b-mbmb3m1…b-m19-m=1,…bb 3.3. Fitness function The fitness function of a genetic algorithm is used to prescribe he optimality of a similarity function simw(for the individual where each component of the vector W E[-1. 1] can be obtained w ). In our genetic algorithm, the fitness function will be the mean through the following expression Absolute Error (MAE) of the Rs (indeed we only use the training users and the training items since we are trying to find an optimal similarity function) for a particular similarity function simw(de- scribed in Eq (1)). The MaE is obtained by comparing the real ratings with the pre 3. 2. Initial population dicted ratings made according to the similarity function. In order to calculate the Mae of the Rs for a particular similarity function, er to choose the population size we ha simw, we need to follow the next steps for each user x: criterion of using a number of individuals in the population which is the double of the number of bits used to represent each individ- 1. obtaining the set of K neighbors ofx, Kx(the K most similar users ual 3]. Consequently, when using Movielens and Netflix the initial population will be of 100 individuals, in view that in both recom- in Eq (1). mender systems M=5 and m=1, but when using FilmAffinity 2. Calculating the prediction of the user x and an item i, p. This is (where M= 10 and m= 1) the initial population would be of 200 obtained through the Deviation From Mean(DFM)as aggrega- individuals. Table 1 shows the precise information used in our tion approach experiments In order to generate the initial population we have considered l(x.,n)×(rn-Fn) the following criteria: The 50% of the initial population is obtained using random where Fx is the average of ratings made by the user x Once every possible prediction according to the similarity func Table 1 tion sim, is calculated, we obtain the mae of the rs as follows Descriptive information of the databases used in the experiments. fitness=MAE 21.128 When 1000209 100480.507 gef trai aingo users and the humber ef reinig Min and max values3. Genetic algorithm In order to find an optimal similarity function, simw, we use a genetic algorithm to find the vector w associated to the optimal similarity function simw. We use a supervised learning task (Goldberg, 1989) whose fitness function is the Mean Absolute Error MAE of the RS. In this way, the population of our genetic algorithm is the set of different vectors of weights, w. Each individual, that is to say each vector of weights, represents a possible similarity measure (see (1)), for which we will evaluate the MAE of the RS using this similarity measure. While running our genetic algorithm, the successive population generations tend to improve the MAE in the RS. Our genetic algorithm stops generating populations when MAE in the RS for a vector of weight is lower than a threshold, c (which has been previously established). As usual, we use only a part of the whole recommender system (training users and training items) in order to obtain an optimal similarity function. Once obtained this similarity function, we carry out our experiments to evaluate the similarity function obtained by using only the test users and the test items (that is to say, those users and items which have not been used in the GA algorithm). 3.1. Genetic representation As usual, individuals of populations are represented in binary form as strings of 0s and 1s. As we have previously stated, in our genetic algorithm, each vector of weights, w, representing a similarity vector is a possible individual of the population. Each component w(i) in the vector of w will be represented by 10 bits. Consequently, the vector w = (w(0), ... , w(Mm) ) will be represented by the following string of 0s and 1s (with a length of 210(Mm+1) bits): b9 Mm ... b1 Mmb0 Mm b9 Mm1 ... b1 Mm1b0 Mm1 ... b9 1 ... b1 1b0 1 b9 0 ... b1 0b1b0 1 where each component of the vector wi e [1, 1] can be obtained through the following expression: wi ¼ 2 P9 j¼102j bj i 210 1 1 ð2Þ 3.2. Initial population In order to choose the population size, we have considered the criterion of using a number of individuals in the population which is the double of the number of bits used to represent each individual [3]. Consequently, when using Movielens and Netflix the initial population will be of 100 individuals, in view that in both recommender systems M = 5 and m = 1, but when using FilmAffinity (where M = 10 and m = 1) the initial population would be of 200 individuals. Table 1 shows the precise information used in our experiments. In order to generate the initial population we have considered the following criteria: The 50% of the initial population is obtained using random values. The 50% of the initial population is seeded in areas where optimal vectors w may lie. In this way, the value w0 measures how the similarity function weights the number of items rated identically by two users. Consequently we can reasonably expect that an optimal similarity function will have a high positive value of w(0). In this way, the value w(0) of individuals will be chosen randomly to lie within a high positive range of values. In the same way, the value w(Mm) (w(4) for Movielens and w(9) for FilmAffinity) measures how the similarity function weights the number of items rated in a totally opposite way by two users. Consequently we can reasonably expect that an optimal similarity function will have a high negative value of w(Mm) . In this way, the value w(Mm) of individuals will be chosen randomly to lie within a high negative range of values. Next, we will show the ranges used to seed the 50% of the initial population. With the Movielens and Netflix Recommender System (m = 1 and M = 5): w0 2 ½1; 0:6; w1 2 ½0:6; 0:2; w2 2 ½0:2; 0:2; w3 2 ½0:2; 0:6; w4 2 ½0:6; 1 With the FilmAffinity Recommender System (m = 1 and M = 10): w0 2 ½1; 0:8; w1 2 ½0:8; 0:6 w2 2 ½0:6; 0:4; w3 2 ½0:4; 0:2; w4 2 ½0:2; 0; w5 2 ½0; 0:2; w6 2 ½0:2; 0:4; w7 2 ½0:4; 0:6; w8 2 ½0:6; 0:8; w9 2 ½0:8; 1 3.3. Fitness function The fitness function of a genetic algorithm is used to prescribe the optimality of a similarity function simw (for the individual w). In our genetic algorithm, the fitness function will be the Mean Absolute Error (MAE) of the RS (indeed we only use the training users and the training items since we are trying to find an optimal similarity function) for a particular similarity function simw (described in Eq. (1)). The MAE is obtained by comparing the real ratings with the predicted ratings made according to the similarity function. In order to calculate the MAE of the RS for a particular similarity function, simw, we need to follow the next steps for each user x: 1. Obtaining the set of K neighbors of x, Kx (the K most similar users to a given user x), through the similarity function simw defined in Eq. (1). 2. Calculating the prediction of the user x and an item i, pi x. This is obtained through the Deviation From Mean (DFM) as aggregation approach: Pi x ¼ rx þ P n2Kx ½simwðx; nÞðri n rnÞ P n2Kx simwðx; nÞ ð3Þ where rx is the average of ratings made by the user x. Once every possible prediction according to the similarity function simw is calculated, we obtain the MAE of the RS as follows: fitness ¼ MAE ¼ 1 #U X u2U P i2Iu pi u ri u #Iu ð4Þ When running the genetic algorithm, #U and #Iu represent respectively the number of training users and the number of training items rated by the user u. Table 1 Descriptive information of the databases used in the experiments. Movielens Film affinity Netflix #Users 6040 26,447 480,189 #Movies 3706 21,128 17,770 #Ratings 1,000,209 19,126,278 100,480,507 Min and max values 1–5 1–10 1–5 1312 J. Bobadilla et al. / Knowledge-Based Systems 24 (2011) 1310–1316

<<向上翻页向下翻页>>

点击下载：《电子商务 E-business》阅读文献：Improving collaborative filtering recommender system results and performance using genetic algorithms