正在加载图片...
M. Y.H. Al-Shamri, KK Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 formula(17)or the Euclidean distance function as giver The feature weights of user ua are represented as a set of weights, weight(ua)=[wili=l.. where n is the number of features. The genotype of wi is a string of binary values. (d(x,y)2 When the weight for any feature is zero, that feature is (20) ignored. This enables feature selection to be adaptive to each user's preference where fd(x ,yi)=d(x yi)xd(xi,yi), d(x, yi)= VE(x-y,), I is the total number of fuzzy sets for 4. 2.2. The fimess fimction Finding an appropriate fitness function is a challeng the ith feature, and xi, is the membership value of the ith problem for GA applications( Goldberg, 1989). For this feature in the jth fuzzy set. Let us call the rs uses formula application it is not a trivial task, every set of weights in (20) for similarity computations as Fuzzy RS(Frs) the Ga population must be employed by the users match ing process within the rs so the rs needs to be re-run on 4.2.Hybrid fuzzy-genetic approach to recommender systems the entire database for each new set of weights in order to find the fitness(Ujin Bentley, 2004). A poor(good)set The proposed user model contains 21 features contrib- of weights might( should) result in a poor(good) neigh ferent weights to different features. These weights are sub- tion is by reformulating the problem as a supervised lea? w uting equally during similarity computations in FRS. This borhood set of users for the active user, and hence poor does not capture the real life situation where users put dif- (good)recommendations. One way to find the fitness fund ject to change with time and evolving preferences of each ing task. For that purpose the actual ratings of the active user.An efficient learning mechanism to capture these user are randomly divided into two disjoint sets, test rat- weights is required. By imposing features hts to for- ings set(66%)and training rating set(34%). To find the mula(20), genetic algorithm(GA)can be used to find fitness score for the evolved set of weights, the RS must these weights leac ading to a hybrid fuzzy-genetic Rs be operated and the predicted ratings for each movie in FGRS). For this approach formula(20)takes the follow- the training ratings set must be computed. The average g form: of the differences (Uiin Bentley, 2004) between the actual and predicted ratings of all movies in the training ratings set is used as the fitness score for that set of dxy)=1∑吗x(dx,y) (21)weights subsections give a brief int and the fitness function where nR is the training ratings set cardinality of a given ac- live user 4.2.1 Genetic algorithms A genetic algorithm processes a population of compet- 5. E ing candidate solutions. The chromosome decodes a set of tuned parameters mapped into a potential solution to Based on Movie Lens dataset we considered only users an optimization problem( Goldberg, 1989). An objective who have rated at least 60 movies, 20 to build a user function evaluates the quality of a solution, which is called model and 40 for testing. Out of 943 users only 497 users a fitness function. Genetic operators such as crossover and satisfied this condition and contributed 84, 596 ratings out mutation are applied to the parents in order to generate of 100,000. This dataset is used as the basis to generate new offspring. Individuals that achieve a high fitness are five random splits into training and active users. For each more likely to be selected as parents and generate offspring random split, 50 users were chosen randomly as active by means of crossover and mutation. The good newly gen- users, and the remaining 447 users as training users-trea erated individuals(if exist) replace the current bad individ- ted as historical data for the Rs. Such a random uals to form the new population for the next generation. separation was intended for the execution of five-fold Ga terminates either if a maximum number of generations cross-validation, where all the experiments are repeated elapses or a desired level of fitness is reached(Goldberg, five times, once with each split. These splits will be lit-5. The set of trainin which can be referred to as feature weight. In order to users (447 users) is used to find a set of neighbors for the implement truly personalized RS, these weights need to active user while the set of active users(50 users)is used to be captured and fine-tuned to reflect each user's preference. test the performance of the system. During the testing In our experiment, GA adapts the feature weights to cap- phase, each active user's ratings are divided randomly into ture the user priorities for different features(Ujin Bent- two disjoint sets, training ratings(34%)and test ratings ley,2004) (66%). The training ratings are used to model the userformula (17) or the Euclidean distance function as given below: fdðx; yÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 21 i¼1 ðfdðxi; yi ÞÞ2 vuut ; ð20Þ where fd(xi,yi) = d(xi,yi) · d(xi,yi), dðxi; yi Þ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pl j¼1ðxi;j yi;jÞ 2 q , l is the total number of fuzzy sets for the ith feature, and xi,j is the membership value of the ith feature in the jth fuzzy set. Let us call the RS uses formula (20) for similarity computations as Fuzzy RS (FRS). 4.2. Hybrid fuzzy-genetic approach to recommender systems The proposed user model contains 21 features contrib￾uting equally during similarity computations in FRS. This does not capture the real life situation where users put dif￾ferent weights to different features. These weights are sub￾ject to change with time and evolving preferences of each user. An efficient learning mechanism to capture these weights is required. By imposing features’ weights to for￾mula (20), genetic algorithm (GA) can be used to find these weights leading to a hybrid fuzzy-genetic RS (FGRS). For this approach formula (20) takes the follow￾ing form: fdðx; yÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 21 j¼1 wj ðfdðxj; yjÞ vuut Þ 2 ; ð21Þ where wj is the weight for the jth feature. The following subsections give a brief introduction to genetic algorithms and the fitness function. 4.2.1. Genetic algorithms A genetic algorithm processes a population of compet￾ing candidate solutions. The chromosome decodes a set of tuned parameters mapped into a potential solution to an optimization problem (Goldberg, 1989). An objective function evaluates the quality of a solution, which is called a fitness function. Genetic operators such as crossover and mutation are applied to the parents in order to generate new offspring. Individuals that achieve a high fitness are more likely to be selected as parents and generate offspring by means of crossover and mutation. The good newly gen￾erated individuals (if exist) replace the current bad individ￾uals to form the new population for the next generation. GA terminates either if a maximum number of generations elapses or a desired level of fitness is reached (Goldberg, 1989). Every user puts a different priority on each feature, which can be referred to as feature weight. In order to implement truly personalized RS, these weights need to be captured and fine-tuned to reflect each user’s preference. In our experiment, GA adapts the feature weights to cap￾ture the user priorities for different features (Ujjin & Bent￾ley, 2004). The feature weights of user ua are represented as a set of weights, weight(ua)=[wi]i=1,...,n, where n is the number of features. The genotype of wi is a string of binary values. When the weight for any feature is zero, that feature is ignored. This enables feature selection to be adaptive to each user’s preference. 4.2.2. The fitness function Finding an appropriate fitness function is a challenging problem for GA applications (Goldberg, 1989). For this application it is not a trivial task, every set of weights in the GA population must be employed by the users match￾ing process within the RS so the RS needs to be re-run on the entire database for each new set of weights in order to find the fitness (Ujjin & Bentley, 2004). A poor (good) set of weights might (should) result in a poor (good) neigh￾borhood set of users for the active user, and hence poor (good) recommendations. One way to find the fitness func￾tion is by reformulating the problem as a supervised learn￾ing task. For that purpose the actual ratings of the active user are randomly divided into two disjoint sets, test rat￾ings set (66%) and training rating set (34%). To find the fitness score for the evolved set of weights, the RS must be operated and the predicted ratings for each movie in the training ratings set must be computed. The average of the differences (Ujjin & Bentley, 2004) between the actual and predicted ratings of all movies in the training ratings set is used as the fitness score for that set of weights fitness ¼ 1 nR XnR j¼0 jrj prjj; ð22Þ where nR is the training ratings set cardinality of a given ac￾tive user. 5. Experiments Based on MovieLens dataset we considered only users who have rated at least 60 movies, 20 to build a user model and 40 for testing. Out of 943 users only 497 users satisfied this condition and contributed 84,596 ratings out of 100,000. This dataset is used as the basis to generate five random splits into training and active users. For each random split, 50 users were chosen randomly as active users, and the remaining 447 users as training users – trea￾ted as historical data for the RS. Such a random separation was intended for the execution of five-fold cross-validation, where all the experiments are repeated five times, once with each split. These splits will be referred to as split-1, split-2,..., split-5. The set of training users (447 users) is used to find a set of neighbors for the active user while the set of active users (50 users) is used to test the performance of the system. During the testing phase, each active user’s ratings are divided randomly into two disjoint sets, training ratings (34%) and test ratings (66%). The training ratings are used to model the user 1394 M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有