正在加载图片...
crossover with probability 1.0. Mutation is applied to each locus in genotype with probability 0.01. A predict vote(, i)= mean +k 2 euclidean(A, j vote(, i)-mean unsigned binary genetic encodin Is use where: mean is the mean vote for user j implementation, using 8 bits for each of the 22 gene Ga begins with random genotypes k is a normalising factor such that the sum of the A genotype is mapped to a phenotype(a set of feature euclidean distances is equal to I weights) by converting the alleles of the binary genes to vote(, i) is actual vote of user j for item i decimal. The feature weights can then be calculated from n is the size of the neighbourhood All the movie items that the active user has seen ar these real values. First, the importance of the 18 genre randomly partitioned into two datasets: a training set(1/3) frequencies are reduced by a given factor, the weight reduction size. This is done because the 18 genres can be and a test set (2/3). To calculate a fitness measure for al considered different categories of a single larger feature evolved set of weights, the recommender system finds Genre. Reducing the effect of these weights is therefore described in section 2.2. The ratings of the users in the intended to give the other unrelated features(movie rating, neighbourhood set are then employed to compute the age, gender, occupation) a more equal chance of bei used. Second, the total value of phenotype is the predicted rating for the active user on each movie item in the training set. Because the active user has already rated calculated by summing the real values for all 22 features. the movie items, It is possible to compare the actual ratin Finally, the weighting value for each feature can be found by dividing the real value by the total value. The sum of differences between the actual and predicted votes of all ems in the training set are used as fitness score to guide uture generations of weight evolution, see figure 4 2.3.1. Fitness function Calculating the fitness for this application is not trivial Every set of weights in the Ga population must be employed by the profile matching processes within the recommender system. So the recommender system must be re-run on the movielens dataset for each new set of weights, in order to calculate its fitness But running a recommender system only produces recommendations (or predictions), not fitnesses. A poor set of weights might result in a poor neighbourhood set of profiles for the active user, and hence poor recommendations. A good set of weights should result in a good neighbourhood set, and good recommendations. S method of calculating the quality of the recommendations Average(ntness, fitness-,fitness is required, in order that a fitness score can be assigned to the corresponding weights It was decided to reformulate the problem as a Figure 4: finding the fitness score of an individual(the active user's supervised learning task. As described previously, given feature weights) the active user A and a set of neighbouring profiles, 3. EXPERIMENTS recommendations for a can be made. In addition to these recommendations, it is possible to predict what A might Four sets of experiments were designed to observe the think of them. For example, if a certain movie is suggested difference in performance between the evolutionary because similar users saw it, but those users only thought recommender system and a standard, non-adaptive the movie was"average", then it is likely that the active recommender system based on the Pearson algorithm [31 user might also think the movie was"average".Hence, for In each set of experiments, the predicted votes of all the the Movielens dataset, it was possible for the system to movie items in the test set( the items that the active user both recommend new movies and to predict how the active has rated but were not used in weights evolution)were user would rate each movie, should he go and see it computed using the final feature weights for that run The predicted vote computation used in this paper has These votes were then compared against those produced been taken from [3] and modified such that the Euclidean from the simple Pearson algorithm distance function(section 3.2.2)now replaces the weight The four sets of experiments were as follows n the original equation. The predicted vote, predict vote(A, i), for A on item i, can be defined ascrossover with probability 1.0. Mutation is applied to each locus in genotype with probability 0.01. A simple unsigned binary genetic encoding is used in the implementation, using 8 bits for each of the 22 genes. The GA begins with random genotypes. A genotype is mapped to a phenotype (a set of feature weights) by converting the alleles of the binary genes to decimal. The feature weights can then be calculated from these real values. First, the importance of the 18 genre frequencies are reduced by a given factor, the weight reduction size. This is done because the 18 genres can be considered different categories of a single larger feature, Genre. Reducing the effect of these weights is therefore intended to give the other unrelated features (movie rating, age, gender, occupation) a more equal chance of being used. Second, the total value of phenotype is then calculated by summing the real values for all 22 features. Finally, the weighting value for each feature can be found by dividing the real value by the total value. The sum of all the weights will then add up to unity. 2.3.1. Fitness function Calculating the fitness for this application is not trivial. Every set of weights in the GA population must be employed by the profile matching processes within the recommender system. So the recommender system must be re-run on the MovieLens dataset for each new set of weights, in order to calculate its fitness. But running a recommender system only produces recommendations (or predictions), not fitnesses. A poor set of weights might result in a poor neighbourhood set of profiles for the active user, and hence poor recommendations. A good set of weights should result in a good neighbourhood set, and good recommendations. So a method of calculating the quality of the recommendations is required, in order that a fitness score can be assigned to the corresponding weights. It was decided to reformulate the problem as a supervised learning task. As described previously, given the active user A and a set of neighbouring profiles, recommendations for A can be made. In addition to these recommendations, it is possible to predict what A might think of them. For example, if a certain movie is suggested because similar users saw it, but those users only thought the movie was "average", then it is likely that the active user might also think the movie was "average". Hence, for the MovieLens dataset, it was possible for the system to both recommend new movies and to predict how the active user would rate each movie, should he go and see it. The predicted vote computation used in this paper has been taken from [3] and modified such that the Euclidean distance function (section 3.2.2) now replaces the weight in the original equation. The predicted vote, predict_vote(A,i), for A on item i, can be defined as: where: meanj is the mean vote for user j k is a normalising factor such that the sum of the euclidean distances is equal to 1. vote(j,i) is actual vote of user j for item i n is the size of the neighbourhood. All the movie items that the active user has seen are randomly partitioned into two datasets: a training set (1/3) and a test set (2/3). To calculate a fitness measure for an evolved set of weights, the recommender system finds a set of neighbourhood profiles for the active user, as described in section 2.2. The ratings of the users in the neighbourhood set are then employed to compute the predicted rating for the active user on each movie item in the training set. Because the active user has already rated the movie items, it is possible to compare the actual rating with the predicted rating. So, the average of the differences between the actual and predicted votes of all items in the training set are used as fitness score to guide future generations of weight evolution, see figure 4. Neighbourhood set Fitness Score for all users where A=j for all items in training set i .. 1 q fitnessi1 fitnessi2 .... fitnessiq Profile Selection and Matching Best Profile Selection / euclidean(A,j) predict vote(A,i) Average( ) fitness ,fitness ,..,fitness i i iq 1 2 Figure 4: finding the fitness score of an individual (the active user's feature weights). 3. EXPERIMENTS Four sets of experiments were designed to observe the difference in performance between the evolutionary recommender system and a standard, non-adaptive recommender system based on the Pearson algorithm [3]. In each set of experiments, the predicted votes of all the movie items in the test set (the items that the active user has rated but were not used in weights evolution) were computed using the final feature weights for that run. These votes were then compared against those produced from the simple Pearson algorithm. The four sets of experiments were as follows: ∑ = = + − n j j k euclidean A j vote j i mean A predict vote A i mean 1 _ ( , ) ( , )( ( , ) )
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有