正在加载图片...
has not explicitly voted on, and these can increase the overlap identify a suitable neighbourhood. The SwAMI(Shared size [4]. Dimensionality reduction methods, such as Single Wisdom through the Amalgamation of Many Interpretations) Value Decomposition, both improve efficiency and increase framework [9 is a publicly accessible software for CF overlap 3. Other pre-processing methods are often used, e. g. experiments. Its central algorithm is as follows clustering [1]. Content-based information can be used to enhance the pure CF approach [10], [6]. Finally, the Select a set of test users randomly from the database weighting of each neighbour can be adjusted by training FOR each test user t there are many learning algorithms available for this [7 All Reserve a vote of this user, i.e. hide from predictor) these improvements could in principle be applied to our Als Select neighbourhood of k reviewers based onr.a. From remaining votes create a new training use but in the interests of a clear and uncluttered comparison we The evaluation of a CF algorithm usually centres on its NEr// b5bourhood to predict vote have kept the CF algorithm as simple as possible Use ne Com s with actual vote and collect statistics accuracy. There is a difference between prediction(given a movie,predict a given user's rating of that movie)and The code shown in italics indicates a place where SWAMI recommendation(given a user, suggest movies that are likely allows an implementation-dependent choice of algorithm W to attract a high rating). Prediction is easier to assess use an AIS to perform selection and prediction as below quantitatively but recommendation is a more natural fit to the movie domain. We present results evaluating both these Algorithm Choices behaviou We use the Swami data encoding Using an AlS for Collaborative Filtering User=flid, score, ) id,, score, )idm, score, B To us, the attraction of the immune system is this: if an Where id corresponds to the unique identifier of the movie adaptive pool of antibodies can produce intelligent being rated and score to this user's score for that movie. This behaviour, can we harness the power of this computation to captures the essential features of the data available EachMovie vote data links a per a movie and antibodies and the new preferences to be matched is the age and gender) is provided but this is not used in our antigen in question. encoding. Content information about movies(e.g. category) Our conjecture is that if the concentrations of those is similarly not used. ntibodies that provide a better match are allowed to increase over time, we should end up with a subset of good matches. Similarity Measure However, we are not interested in optimising, 1.e. in finding The Pearson measure is used to compare two users u and v the one best match. Instead, we require a set of antibodies that are a close match but which at the same time distinct ∑(u-n)v,-) from each other for successful recommendation. This is p where we propose to harness the idiotypic effects of binding antibodies to similar antibodies to encourage diversity. u1-)∑(m-) The next section presents more details of our problem explains the AIs model we intend to use. We then describe Where u and v are users, n is the number of overlapping the experimental set-up and present some initial results. votes(i. e. movies for which both u and v have voted), u,is Finally we review the results and discuss some possibilities the vote of user u for movie i and ii is the average vote of for future work user u over all films(not just the overlapping votes).The measure is amended as follows 2. ALGORITHMS if n=0, r= NoOverlapDefault Application of the Als to the EachMovie Tasks j∑u-a)∑(n,-)=0.,r= Zero variance Default(2) The EachMovie database [5] is a public database, which i= votes taken from 72,916 users on 1,628 films. The task is to n<, r=n (where P=overlap penalty records explicit votes of users for movies. It holds 2, 811,983 use this data to make predictions and recommendations. In the former case, we provide an estimated vote for a The two default values are required because it is previously unseen movie. In the latter case, we present a impossible to calculate a Pearson measure in such cases.Both ranked list of movies that the user might like were set to O Some experimentation showed that an overlap The basic approach of CF, is to use information from a penalty P was beneficial(this lowers the absolute correlation neighbourhood make useful predictions and for users with only a small overlap) but that the exact value recommendations. The central task we set ourselves is tohas not explicitly voted on, and these can increase the overlap size [4]. Dimensionality reduction methods, such as Single Value Decomposition, both improve efficiency and increase overlap [3]. Other pre-processing methods are often used, e.g. clustering [1]. Content-based information can be used to enhance the pure CF approach [10], [6]. Finally, the weighting of each neighbour can be adjusted by training, and there are many learning algorithms available for this [7]. All these improvements could in principle be applied to our AIS but in the interests of a clear and uncluttered comparison we have kept the CF algorithm as simple as possible. The evaluation of a CF algorithm usually centres on its accuracy. There is a difference between prediction (given a movie, predict a given user’s rating of that movie) and recommendation (given a user, suggest movies that are likely to attract a high rating). Prediction is easier to assess quantitatively but recommendation is a more natural fit to the movie domain. We present results evaluating both these behaviours. Using an AIS for Collaborative Filtering To us, the attraction of the immune system is this: if an adaptive pool of antibodies can produce 'intelligent' behaviour, can we harness the power of this computation to tackle the problem of preference matching and recommendation? Thus, in the first instance we intend to build a model where known user preferences are our pool of antibodies and the new preferences to be matched is the antigen in question. Our conjecture is that if the concentrations of those antibodies that provide a better match are allowed to increase over time, we should end up with a subset of good matches. However, we are not interested in optimising, i.e. in finding the one best match. Instead, we require a set of antibodies that are a close match but which at the same time distinct from each other for successful recommendation. This is where we propose to harness the idiotypic effects of binding antibodies to similar antibodies to encourage diversity. The next section presents more details of our problem and explains the AIS model we intend to use. We then describe the experimental set-up and present some initial results. Finally we review the results and discuss some possibilities for future work. 2. ALGORITHMS Application of the AIS to the EachMovie Tasks The EachMovie database [5] is a public database, which records explicit votes of users for movies. It holds 2,811,983 votes taken from 72,916 users on 1,628 films. The task is to use this data to make predictions and recommendations. In the former case, we provide an estimated vote for a previously unseen movie. In the latter case, we present a ranked list of movies that the user might like. The basic approach of CF, is to use information from a neighbourhood to make useful predictions and recommendations. The central task we set ourselves is to identify a suitable neighbourhood. The SWAMI (Shared Wisdom through the Amalgamation of Many Interpretations) framework [9] is a publicly accessible software for CF experiments. Its central algorithm is as follows: Select a set of test users randomly from the database FOR each test user t Reserve a vote of this user, i.e. hide from predictor) From remaining votes create a new training user t’ Select neighbourhood of k reviewers based on t’ Use neighbourhood to predict vote Compare this with actual vote and collect statistics NEXT t The code shown in italics indicates a place where SWAMI allows an implementation-dependent choice of algorithm. We use an AIS to perform selection and prediction as below. Algorithm Choices We use the SWAMI data encoding: User = {{id1 ,score1},{id2 ,score2 }...{idn ,scoren }} Where id corresponds to the unique identifier of the movie being rated and score to this user’s score for that movie. This captures the essential features of the data available. EachMovie vote data links a person with a movie and assigns a score (taken from the set {0, 0.2, 0.4, 0.6, 0.8, 1.0} where 0 is the worst). User demographic information (e.g. age and gender) is provided but this is not used in our encoding. Content information about movies (e.g. category) is similarly not used. Similarity Measure The Pearson measure is used to compare two users u and v: ( )( ) ( ) ( ) )1( 1 1 2 2 1 ∑ ∑ ∑ = = = − − − − = n i n i i i n i i i u u v v u u v v r Where u and v are users, n is the number of overlapping votes (i.e. movies for which both u and v have voted), ui is the vote of user u for movie i and ū is the average vote of user u over all films (not just the overlapping votes). The measure is amended as follows ( ) ( ) , ( ) ,0 )2( ,0 1 1 2 2 r where P overlap penalty P n if n P r if u u v v r ZeroVarianceDefault if n r NoOverlapDefault n i n i i i < = = − − = = = = ∑ ∑ = = The two default values are required because it is impossible to calculate a Pearson measure in such cases. Both were set to 0. Some experimentation showed that an overlap penalty P was beneficial (this lowers the absolute correlation for users with only a small overlap) but that the exact value
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有