Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method Zhe Wang Yo w Hu wu National engineerin National engineerin National Engineering Research Center of Research Center of Fundamental software Fundamental Software Fundamental Software Institute of software Institute of software Institute of software Chinese academy of Chinese academy of Chinese academy of Sciences Sciences Sciences Graduate University of yang@techs. iscas ac cn wuhu@@techs.iscasaccn Chinese academy of wangzhe07@iscas ac cn ABSTRACT Nowadays people are inundated by choices. Personal- Collaborative filtering(CF) is a method for personal- ized recommendation is a solution to this problem. Var- ized recommendation. The sparsity of rating data se ious kinds of recommender systems are employed for riously impairs the quality of CF's recommendation. better user experience. Collaborative filtering 4, 12 is Meanwhile, there is more and more tag information gen- one of the best techniques of choice therein. This tech ated by online users that implies their preferences. nique tries to identify users that have relevant interests Exploiting these tag data is a promising means to al- by calculating similarities among user profiles. The idea leviate the sparsity problem. Although the intention is is that it may be of benefit to one's search for informa- raight-forward, there's no existed solution that makes tion to consult the behavior of other users who share full use of tags to improve the recommendation qual- the same or relevant interests ity of traditional rating-based collaborative filtering ap- proaches. In this paper, we propose a novel approach Because collaborative filtering recommendation depends to fuse a tag-based neighborhood method into the tradi- on the preference of the users with the same or rele- tional rating-based CF. Tag-based neighborhood method vant interests, the similarity computation imposes sig employed to find similar users and items. These nificant influence on the quality of recommendation neighborhood information helps the sequent Cf pro- Early item-based and user-based collaborative filtering cedure produce higher quality recommendations. The approaches find similar users or items(neighbors) by experiments show that our approach outperforms the calculating Pearson correlation coefficient (23. These state-of-the-art ones approaches are efficient and effective. But simply com- paring the rating records of different users or items can- ACM Classification Keyword not help to find the best neighbors. If a user has few H.3.3 Information Storage and Retrieval: Information ratings for items or this user only gives all his/her rat- Search and Retrieval-Information filtering ings to the unpopular ones, it will be difficult for those approaches to find the proper neighbors General terms gorithms, Experimentation Recently, matrix factorization approaches earn popularity because of their higher recommendation ity and smaller online costs. One of the most signif Author Keywords cant differences from early approaches is that they ex- Tags, Latent Dirichlet Allocation(LDA), Collaborative tract the "features"of the users and the items. By this Filtering, Neighborhood Method way, they decompose the original preference matrix into several low rank approximates or the items. ev- INTRODUCTION ery feature reflects the preference by a group of similar users. For the users, every feature reflects their pref- erence for a collection of similar items. By virtue of Permission to make d extracting users'and items'features, matrix factoriza or hard copies of all or part of this work for tion approaches are able to find bett ts an not made or distributed for profit or commercial advantage and that copies hence produce better recommendations bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific Despite the merits mentioned before, the existing ma- ermission and/or a fee Workshop SRS'10, February 7, 2010 Hong Kong, China trix factorization approaches 6, 7, 8, 16, 26 fail to ex Copyright2010ACM978-1-60558-995-4.s10.00
Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method Zhe Wang National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences Graduate University of Chinese Academy of Sciences wangzhe07@iscas.ac.cn Yongji Wang National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences ywang@itechs.iscas.ac.cn Hu Wu National Engineering Research Center of Fundamental Software, Institute of Software, Chinese Academy of Sciences wuhu@itechs.iscas.ac.cn ABSTRACT Collaborative filtering (CF) is a method for personalized recommendation. The sparsity of rating data seriously impairs the quality of CF’s recommendation. Meanwhile, there is more and more tag information generated by online users that implies their preferences. Exploiting these tag data is a promising means to alleviate the sparsity problem. Although the intention is straight-forward, there’s no existed solution that makes full use of tags to improve the recommendation quality of traditional rating-based collaborative filtering approaches. In this paper, we propose a novel approach to fuse a tag-based neighborhood method into the traditional rating-based CF. Tag-based neighborhood method is employed to find similar users and items. These neighborhood information helps the sequent CF procedure produce higher quality recommendations. The experiments show that our approach outperforms the state-of-the-art ones. ACM Classification Keywords H.3.3 Information Storage and Retrieval: Information Search and Retrieval-Information filtering General Terms Algorithms, Experimentation Author Keywords Tags, Latent Dirichlet Allocation (LDA), Collaborative Filtering, Neighborhood Method INTRODUCTION Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Workshop SRS’10, February 7, 2010 Hong Kong, China Copyright 2010 ACM 978-1-60558-995-4... $10.00 Nowadays people are inundated by choices. Personalized recommendation is a solution to this problem. Various kinds of recommender systems are employed for better user experience. Collaborative filtering [4, 12] is one of the best techniques of choice therein. This technique tries to identify users that have relevant interests by calculating similarities among user profiles. The idea is that it may be of benefit to one’s search for information to consult the behavior of other users who share the same or relevant interests. Because collaborative filtering recommendation depends on the preference of the users with the same or relevant interests, the similarity computation imposes significant influence on the quality of recommendation. Early item-based and user-based collaborative filtering approaches find similar users or items (neighbors) by calculating Pearson correlation coefficient [23]. These approaches are efficient and effective. But simply comparing the rating records of different users or items cannot help to find the best neighbors. If a user has few ratings for items or this user only gives all his/her ratings to the unpopular ones, it will be difficult for those approaches to find the proper neighbors. Recently, matrix factorization approaches earn more popularity because of their higher recommendation quality and smaller online costs. One of the most signifi- cant differences from early approaches is that they extract the “features” of the users and the items. By this way, they decompose the original preference matrix into several low rank approximates [15]. For the items, every feature reflects the preference by a group of similar users. For the users, every feature reflects their preference for a collection of similar items. By virtue of extracting users’ and items’ features, matrix factorization approaches are able to find better neighbors and hence produce better recommendations. Despite the merits mentioned before, the existing matrix factorization approaches [6, 7, 8, 16, 26] fail to ex-
tract sufficient feature information, which reflects the where iE N+, j E M+ and Tii E [1, Rmaz]. The usual problem of data sparsity. It is because they fit the orig evaluation process is the hold-out cross validation]. A nal matrix by feature extraction only based on the rat certain proportion of ratings are hidden for testing and ing data while the rating data are extremely sparse. If the rest are used for training. The measures of evalua- ve could obtain more ratings, we would surely enhance tion include complexity and accuracy. Nevertheless, the he quality of fitting process. From this standing point much more important because most of the we propose a better collaborative filtering approach to Collaborative Filtering approaches are offline. There exploit additional knowledge from the tags as a supple- fore, it is the focus in this paper nent to ratings. Tags are simple, ad-hoc labels assigned by users to de Naive Estimates scribe or annotate any kind of resource for future re- One of the most instinctive predicting methods is to a user's perspective and preference with ease. Most re- item's average biases involved, we get the naive estimate ent work focuses on the tag recommendation in which the objects to recommend are tags [18, 20, 22, 27. In the case of item-based recommendation, users expect bx=μ+b+b;, to get specific suggestion on which item might be in- where bij indicate the predicted rating of user on itemj teresting. There are a limited number of solutions for p is the global average rating; bi and bj denote useri's this situation, and most of them do not have a gen d item,'s average bias, respectively ralized adaptation to different data resources because they ignore abundant rating data 11, 25. In this paper, This naive method is effective and scalable, but it does we offer a novel personalized recommendation method not take the interaction between users into account ev- which matches the case of containing both ratings and ery user's rating for a item has infuences on other users opinions to that item. This interdependence between the users forms a social network 24 which connects all Our approach still shares the main idea of classic neigh- users together. The personalized recommendations are borhood method. but there are some differences in where not delivered in isolation, but in the context of this so- to find neighbors. The neighbors are usually found in cial network [14]. The neighborhood method is one of the ratings for the traditional CF approach[1. We do the most effective methods to analyze the context not find neighbors directly by this means. First we ex- ploit the latent topic grouping information hidden in tags and then we find groups of the users interested in Neighborhood Method similar topics and collections of the items under sim- he aim of the neighborhood method 2 is to find the ilar topics. To predict the user's rating for the item ers who give similar ratings and the items which re- we consult the ratings of both of the user's and the ceive similar ratings. The approximate ratings infer the item's neighbors by employing a neighborhood method. potential similarity of the future ratings. This is the Thanks to taking into account both tag neighbors and basic assumption of collaborative filtering. Because the rating neighbors, our method outperforms most popular neighborhood method digs out from the neighbors the CF a clues that indicate the potential ratings, it produces better predictions than the naive estimate. The model of the neighborhood method unifying item-based and Section 2 we introduce the background and the related user-based collaborative filtering approaches is rative filtering method in details.Setn4wegv=b+∑硎,(h-b)+∑(-b3) two toy examples and compare our method with NMF h∈Sk(jii) h∈sk(i:y) PMF and SVD on a popular movie dataset. And finally we conclude this paper with some future work where Fi is the predicted rating: bij refers to the naive estimate's prediction; G; i)denotes the set includ- ng k nearest rated items neighboring with item, for a PRELIMINARIES given user and Thi; S(i; j)denotes the set including k Rating prediction is one of the most popular means to nearest users neighboring with user: for a given item; valuate the performance of collaborative filtering al and rih: 0 reflects the different weights of Tih. There gorithms. From the rating data of most collaborative are several representations for the weights. The cosine filtering datasets, we can obtain a N x M rating matrix similarity is one the most effective measures to indicate R including N users and M items. Matrix R is defined the different weights Cosine Similarty useris rating for item,, if user has rated item otherwis where a and b are both vectors with the same dimension
tract sufficient feature information, which reflects the problem of data sparsity. It is because they fit the original matrix by feature extraction only based on the rating data while the rating data are extremely sparse. If we could obtain more ratings, we would surely enhance the quality of fitting process. From this standing point, we propose a better collaborative filtering approach to exploit additional knowledge from the tags as a supplement to ratings. Tags are simple, ad-hoc labels assigned by users to describe or annotate any kind of resource for future retrieval. Their flexibility means they probably capture a user’s perspective and preference with ease. Most recent work focuses on the tag recommendation in which the objects to recommend are tags [18, 20, 22, 27]. In the case of item-based recommendation, users expect to get specific suggestion on which item might be interesting. There are a limited number of solutions for this situation, and most of them do not have a generalized adaptation to different data resources because they ignore abundant rating data [11, 25]. In this paper, we offer a novel personalized recommendation method which matches the case of containing both ratings and tags. Our approach still shares the main idea of classic neighborhood method, but there are some differences in where to find neighbors. The neighbors are usually found in the ratings for the traditional CF approach [1]. We do not find neighbors directly by this means. First we exploit the latent topic grouping information hidden in tags and then we find groups of the users interested in similar topics and collections of the items under similar topics. To predict the user’s rating for the item, we consult the ratings of both of the user’s and the item’s neighbors by employing a neighborhood method. Thanks to taking into account both tag neighbors and rating neighbors, our method outperforms most popular CF approaches. The structure of the rest of the paper is as follows. In Section 2 we introduce the background and the related works. In section 3 we explain our improved collaborative filtering method in details. In Section 4 we give two toy examples and compare our method with NMF, PMF and SVD on a popular movie dataset. And finally we conclude this paper with some future work. PRELIMINARIES Rating prediction is one of the most popular means to evaluate the performance of collaborative filtering algorithms. From the rating data of most collaborative filtering datasets, we can obtain a N ×M rating matrix R including N users and M items. Matrix R is defined as rij = ½ useri ’s rating for itemj , if useri has rated itemj 0 , otherwise , where i ∈ N +, j ∈ M+ and rij ∈ [1, Rmax]. The usual evaluation process is the hold-out cross validation[5]. A certain proportion of ratings are hidden for testing and the rest are used for training. The measures of evaluation include complexity and accuracy. Nevertheless, the accuracy is much more important because most of the Collaborative Filtering approaches are offline. Therefore, it is the focus in this paper. Naive Estimates One of the most instinctive predicting methods is to compute the mean values. Taking the user’s and the item’s average biases involved, we get the naive estimate [8]: bij = µ + bi + bj , (1) where bij indicate the predicted rating of useri on itemj ; µ is the global average rating; bi and bj denote useri ’s and itemj ’s average bias, respectively. This naive method is effective and scalable, but it does not take the interaction between users into account. Every user’s rating for a item has influences on other users’ opinions to that item. This interdependence between the users forms a social network [24] which connects all users together. The personalized recommendations are not delivered in isolation, but in the context of this social network [14]. The neighborhood method is one of the most effective methods to analyze the context. Neighborhood Method The aim of the neighborhood method [2] is to find the users who give similar ratings and the items which receive similar ratings. The approximate ratings infer the potential similarity of the future ratings. This is the basic assumption of collaborative filtering. Because the neighborhood method digs out from the neighbors the clues that indicate the potential ratings, it produces better predictions than the naive estimate. The model of the neighborhood method unifying item-based and user-based collaborative filtering approaches is rˆij = bij+ X h∈Sk(j;i) θ i hj (rih−bih)+ X h∈Sk(i;j) θ j ih(rhj−bhj ), (2) where ˆrij is the predicted rating; bij refers to the naive estimate’s prediction; S k (j;i) denotes the set including k nearest rated items neighboring with itemj for a given useri and rhj ; S k (i; j) denotes the set including k nearest users neighboring with useri for a given itemj and rih; θ reflects the different weights of rih. There are several representations for the weights. The cosine similarity is one the most effective measures to indicate the different weights. Cosine Similarty = a · b ||a||2 ||b||2 , where a and b are both vectors with the same dimension
The neighborhood method find itemi's k nearest neigh- CF recommendation methods. In the background of bors(k-NN). These neighbors infer the potential value electronic commerce and video on demand, proper item of ri to different degree according to their similarit recommendations are better since the items are over with item,. Although there are several different simi- whelmingly numerous larity measures employed to compute the similarity be- tween the items, the similarity between items is repre nted by the distance between their rating vectors. The Topic Finding hilarity of the items which have less common raters As with the rating data, the tag data can be represented are structurally lower. If there're high level features ex- as a n x m sparse matrix T given n users and m items tracted to represent the user and the item, the similarity can be better measured this way. Matrix factorization ti, =fuser, 's tags for items, if user: has tagged item,j methods learn this lesson The users are allowed to give more than one tag to each Matrix Factorization item. So if the tags are clearly separated, T becomes To extract high level feature, matrix factorization meth a three-dimensional tensor The three dimensions are ods try to find the rating matrix's low rank approxi user, item. and tag. This is a tough case to take care of mations [15, 21]. They focus on fitting the user-item and it is why there is little work on extract preference rating matrix by low-rank approximation and use the information from this data resource. Innovatively, we fitting result to make sequent predictions 6, 7, 8, 16 divide T into user-tag and item tag matrices represent- The premise behind this low-dimensional factor model ng the tags given by the users and the tags received is that there is only a small number of factors or features oy the items, respectively. The user-tag and item-tag influencing preferences, and that a user's preference for matrices are denoted as TU and T which are defined an item is only determined by that user's feature vector as follows: and that item's feature vector [t1,t2 What is related to our work is not the basic matrix factorization methods. Recently, some matrix factoriza tion methods which involve auxiliary information analy- where tu denotes the tags user has given, and ti de is draw our attention. [13 proposes an trust-aware col notes the tags item has been given laborative filtering algorithm. The algorithm is based on the general knowledge that people normally ask friends When the original tensor T is converted into bags of for recommendations. Due to the memory-based model words in T and T, we can apply LDa 3 to find this algorithm suffers from huge online costs. Trust latent topic information therein. Processing T and values need to be computed like similarity measures. TI separately, we find their latent topics in the form of SoRec [10 fuses the existed trust-based approach with probability. Probabilistic Matrix Factorization(PMF)[16. This 0g= p(topic=jjuser=i) methods is model-based, but it cannot be widely ap- plied due to the scarce resource of trust information 0i=p(topic= litem= i) which involves people's privacy. 9 proposes a relation regularized matrix factorization method for relational where oi denotes user ' probability of preferring for data analysis. Yet it is designed for making recommen- topic and Bi; denotes itemi' probability of being re- dations concerning objects that have both content and lated to the topicj. This is a kind of " soft clustering links. The idea of Collective Matrix Factorization 19 It is possible that a user or item is under multiple top- innovative: factorizing multiple matrices simultane- ics with the same probability. The similarity between ously with shared parameters. The weakness of this these row vectors in e and e more appropriately re- method is that the parameter learning process is com- flects the users'and items'similarity because clustering putationally costly is based on the semantics of the tags The matrices e and e are not directly used for rating prediction TAG-BASED ITEM RECOMMENDATION Because they are full matrices which are not appropri- Since tags and ratings are two of the most attributes ate for computation, we set a threshold value to reserve attached to items, we propose a generalized neighbor hood recommendation method to make use of them in mportant reason for this process is that most of the the same time. Our work is based on the assumption users and items are indirectly related with each other that the behavior of tagging and rating share the same in reality notivation: item classification. In this sense. the la tent preference information found in tagging data ha After finding the matrices e and e, it is easy to em- more power than that in rating data. Regarding tags, ploy k-nN clustering to find the groups whose members there are two types of recommendation: item recom- share the same interests or attributes mendation and keyword recommendation. Our Conce is item recommendation which is the same with most Rating Prediction
The neighborhood method find itemj ’s k nearest neighbors (k-NN). These neighbors infer the potential value of rij to different degree according to their similarity with itemj . Although there are several different similarity measures employed to compute the similarity between the items, the similarity between items is represented by the distance between their rating vectors. The similarity of the items which have less common raters are structurally lower. If there’re high level features extracted to represent the user and the item, the similarity can be better measured this way. Matrix factorization methods learn this lesson. Matrix Factorization To extract high level feature, matrix factorization methods try to find the rating matrix’s low rank approximations [15, 21]. They focus on fitting the user-item rating matrix by low-rank approximation and use the fitting result to make sequent predictions [6, 7, 8, 16]. The premise behind this low-dimensional factor model is that there is only a small number of factors or features influencing preferences, and that a user’s preference for an item is only determined by that user’s feature vector and that item’s feature vector. What is related to our work is not the basic matrix factorization methods. Recently, some matrix factorization methods which involve auxiliary information analysis draw our attention. [13] proposes an trust-aware collaborative filtering algorithm. The algorithm is based on the general knowledge that people normally ask friends for recommendations. Due to the memory-based model, this algorithm suffers from huge online costs. Trust values need to be computed like similarity measures. SoRec [10] fuses the existed trust-based approach with Probabilistic Matrix Factorization (PMF) [16]. This methods is model-based, but it cannot be widely applied due to the scarce resource of trust information which involves people’s privacy. [9] proposes a relation regularized matrix factorization method for relational data analysis. Yet it is designed for making recommendations concerning objects that have both content and links. The idea of Collective Matrix Factorization [19] is innovative: factorizing multiple matrices simultaneously with shared parameters. The weakness of this method is that the parameter learning process is computationally costly. TAG-BASED ITEM RECOMMENDATION Since tags and ratings are two of the most attributes attached to items, we propose a generalized neighborhood recommendation method to make use of them in the same time. Our work is based on the assumption that the behavior of tagging and rating share the same motivation: item classification. In this sense, the latent preference information found in tagging data has more power than that in rating data. Regarding tags, there are two types of recommendation: item recommendation and keyword recommendation. Our concern is item recommendation which is the same with most CF recommendation methods. In the background of electronic commerce and video on demand, proper item recommendations are better since the items are overwhelmingly numerous. Topic Finding As with the rating data, the tag data can be represented as a n × m sparse matrix T given n users and m items, tij = ½ useri ’s tags for itemj , if useri has tagged itemj null , otherwise. The users are allowed to give more than one tag to each item. So if the tags are clearly separated, T becomes a three-dimensional tensor. The three dimensions are user, item, and tag. This is a tough case to take care of and it is why there is little work on extract preference information from this data resource. Innovatively, we divide T into user-tag and item tag matrices representing the tags given by the users and the tags received by the items, respectively. The user-tag and item-tag matrices are denoted as T U and T I which are defined as follows: T U = [ t1, t2, ..., tn] T , T I = [ t1, t2, ..., tm] T , where tu denotes the tags useru has given, and ti denotes the tags itemi has been given. When the original tensor T is converted into bags of words in T U and T I , we can apply LDA [3] to find latent topic information therein. Processing T U and T I separately, we find their latent topics in the form of probability. θ U ij = p(topic = j|user = i), θ I ij = p(topic = j|item = i), where θ U ij denotes useri ’ probability of preferring for topicj and θ I ij denotes itemi ’ probability of being related to the topicj . This is a kind of “soft clustering”. It is possible that a user or item is under multiple topics with the same probability. The similarity between these row vectors in ΘU and ΘI more appropriately re- flects the users’ and items’ similarity because clustering is based on the semantics of the tags. The matrices ΘU and ΘI are not directly used for rating prediction. Because they are full matrices which are not appropriate for computation, we set a threshold value to reserve high similarity relations and clear the others. Another important reason for this process is that most of the users and items are indirectly related with each other in reality. After finding the matrices ΘU and ΘI , it is easy to employ k-NN clustering to find the groups whose members share the same interests or attributes. Rating Prediction
We assume all the users who tag the items also give rat- delve into it. Their dataset includes three files. One ngs and that all the items which are tagged also receive is rating data which contains users ratings for movies ratings. If some users actually fail to give either ratings another is tagging data which contains movies'tags and or tags, we still can make use of what they input to the the users id who made the tag, and the other is movie recommender system. Even with few entries, the recom overview data which contains the movie's name release mender system still understands what the user wants. year and genre. The user are allowed to give ratings We hold this claim because tags are more informative. and tags to the movies they have seen. The ratings are If the user only put one tag amine"te tegers between 1 to 5. could infer that this is a animation fun. But if this user only give a high rating to the movie Avatar", what We intend to leverage tag analysis to help rating pre- should we infer from this? Which groups of movie does diction. So we need the movies that have both ratings is user like actions. adventures, fantasies or sci-fis? nd tags and the users that give both ratings and tags After taking the intersection of the rating data and tag Most recommender systems use the integral interval data, we get the rating datas subset which contains all 1, Rmaz to represent the users' preference on items the tagged movies. This subset contains 905686 ratings It is necessary to normalize the ratings into the ran from 4001 for 7600 movies. The density of these to 0, 1, because only this interval makes sense for prob- rating data is ability. There are many mapping methods. One of the most widely used mapping function is f(a)=(r 905686 1)/(Rmaz-1). As far as we know, the influence exerted 4001×7600=2.974% on the final recommendation by different mapping func- From this subset, we randomly and independently choose tions is not significantly different 20%, 50%, 80% and 99% of rating data as separate training sets. The remaining rating data are used as So our next step is to make rating predictions based on the testing sets. The experiments are all repeated five the grouping results stated in the last section. The pre- times to reduce errors diction is made according to a neighborhood method. Fij= A+bi+b Toy Examples Because the quality of recommendation is eventually b chero(rhj-bhj)Ih thet( h) reflected in the results of rating prediction accuracy. To obtain a clearer vision about the qualitative quality, we ∑h∈ro(Th-bnh)l (3) present two toy examples in smaller data volume scale Iih First we extract the tags from 6 users. The tag matrix where bi denotes useri's bias for the topic T(i), and b denotes item,'s bias for the topic T(). T(i) and T() denote the topic user is interested in and the topic killer action horror tem, is under, respectively. Each topic is a set which thrill action contains a number of users or items 1墨 anine fantasy anine Plus, we give different weights to the neighbors with Japanese animE different distances. The algorithms weighted variant ic documentary American realistic μ+b+b ∑her(mh-bmy)Sh1lB We set the hyper-parameter topic number as 3 and con- duct LDa analysis to get the probabilistic matrix 2h∈T(i) )(rih-bin)Sp /0.3456790.3456790.308642 0.3641980.3086420.308642 where Ah, 0i and O, denote the row vectors of proba- e=0.306420345679035679 bilities in E. S represents the cosine similarity of the vectors hh and 6 0.3456790.3086420.345679 0.3086420.3456790.345679 EXPERIMENTAL ANALYSIS Dataset Description It is quite obvious that user1 and user2 have the same Movielens Dataset is created by Movielens movie rec- or similar interests. The first column values is the m ommender which aims to provide online movie recom- imum among all three columns for both of him, which endation service [17. Their work is a more involved infers the topic they are most probably interested in is system rather than a particular algorithm, so we do not the first topic. For the same reason, users and usera
We assume all the users who tag the items also give ratings and that all the items which are tagged also receive ratings. If some users actually fail to give either ratings or tags, we still can make use of what they input to the recommender system. Even with few entries, the recommender system still understands what the user wants. We hold this claim because tags are more informative. If the user only put one tag ”amine” to some item, we could infer that this is a animation fun. But if this user only give a high rating to the movie ”Avatar”, what should we infer from this? Which groups of movie does this user like, actions, adventures, fantasies or sci-fis? Most recommender systems use the integral interval [1, Rmax] to represent the users’ preference on items. It is necessary to normalize the ratings into the range to [0, 1], because only this interval makes sense for probability. There are many mapping methods. One of the most widely used mapping function is f(x) = (x − 1)/(Rmax−1). As far as we know, the influence exerted on the final recommendation by different mapping functions is not significantly different. So our next step is to make rating predictions based on the grouping results stated in the last section. The prediction is made according to a neighborhood method: rˆij = µ + b ∗ i + b ∗ j , b ∗ i = P h∈T(i) (rhj − bhj )I R hj P h∈T(i) I R hj , b ∗ j = P h∈T(j) (rih − bih)I R ih P h∈T(j) I R ih , (3) where b ∗ i denotes useri ’s bias for the topic T(i), and b ∗ j denotes itemj ’s bias for the topic T(j). T(i) and T(j) denote the topic useri is interested in and the topic itemj is under, respectively. Each topic is a set which contains a number of users or items. Plus, we give different weights to the neighbors with different distances. The algorithm’s weighted variant is rˆij = µ + b ∗ i + b ∗ j , b ∗ i = P h∈T(i) (rhj − bhj )ShiI R hj P h∈T(i) ShiI R hj , b ∗ j = P h∈T(j) (rih − bih)Shj I R ih P h∈T(j) Shj I R ih , (4) where θh, θi and θj denote the row vectors of probabilities in Θ. S represents the cosine similarity of the vectors θh and θj . EXPERIMENTAL ANALYSIS Dataset Description Movielens Dataset is created by Movielens movie recommender which aims to provide online movie recommendation service [17]. Their work is a more involved system rather than a particular algorithm, so we do not delve into it. Their dataset includes three files. One is rating data which contains users’ ratings for movies, another is tagging data which contains movies’ tags and the user’s id who made the tag, and the other is movie overview data which contains the movie’s name, release year and genre. The user are allowed to give ratings and tags to the movies they have seen. The ratings are integers between 1 to 5. We intend to leverage tag analysis to help rating prediction. So we need the movies that have both ratings and tags and the users that give both ratings and tags. After taking the intersection of the rating data and tag data, we get the rating data’s subset which contains all the tagged movies. This subset contains 905686 ratings from 4001 users for 7600 movies. The density of these rating data is 905686 4001 × 7600 = 2.974%. From this subset, we randomly and independently choose 20%, 50%, 80% and 99% of rating data as separate training sets. The remaining rating data are used as the testing sets. The experiments are all repeated five times to reduce errors. Toy Examples Because the quality of recommendation is eventually reflected in the results of rating prediction accuracy. To obtain a clearer vision about the qualitative quality, we present two toy examples in smaller data volume scale. First we extract the tags from 6 users. The tag matrix T U is as follows: horror killer action horror horror action thrill action f antasy anime f antasy anime anime Japanese anime f antasy documentary 911 terrorist hero historic documentary American realistic We set the hyper-parameter topic number as 3 and conduct LDA analysis to get the probabilistic matrix Θ U = 0.345679 0.345679 0.308642 0.364198 0.308642 0.308642 0.308642 0.345679 0.345679 0.327160 0.345679 0.327160 0.345679 0.308642 0.345679 0.308642 0.345679 0.345679 It is quite obvious that user1 and user2 have the same or similar interests. The first column values is the maximum among all three columns for both of him, which infers the topic they are most probably interested in is the first topic. For the same reason, user3 and user4
DieAnotherDay Casino Roya Topic 3 Topic 4 It Happened one Night Blood diamo Pulp Fictio Fahrenheit The sixth sens Hotel star Wars: Episode V- Empire sakes The Matrox 争●“ 67891011 Topic 10 The shawshank Redomatio 8910111213 1. Animation 6.Thriller cummcntar 2.ch⊥1dren 3. Fantasy Adventure 13. Musical 15. Romance Figure 1. Top 5 movies under 10 different topics and their provided genre information are similar and users and users are alike. The sub 1. There are at least 4 movies under the same genre ctive grouping result is exactly the same with that of for every topic. Topicl, Topic2 and Topic3 all have a k-nn method to the matrix Theata with the same more than two such common genres. The high co- hyper-parameters. Considering the semantic meanin occurrence frequency of different movies under the of these 24 tags, we conclude that the result of the anal- same genre reflects the large extent of conformation ysis on the user-tag matrix is persuasive. The analysis The coexistence of movie series like The matriz and on the item-tag matrix is effective in a similar way. Yet there are more merits for analyzing the item-tag ma can find not only the movies of the same genres but rix because we can obtain the names and genres of the Iso the movies of the same series movies from the overview data of the dataset 2. The five movies in Topic6 all reflect big social prob- In the second example, we extract the tag data in movie- lems. This problem could be war, terrorist attack, or lens dataset and get the e-tag matrix. There are social security crisis. This explains why these movies 600 movies in this matrix. We make the tag analysis under different genres are in the same topic. The to find the latent topics again. For the sake of leaving topic finding result is not equal to the genre classifica- more space for showing more important results, we set tion. The topic"social problem"may be interesting the desired topic number as 10. Fig. 1 presents the five for some of the users. These details are more valuable most probable movies per topic. According to it, we for inferring the users' preference than genres. have several observations as follows 3. According to the corresponding rating data, the aver- age variance of ratings of the five movies under their
1.Animation 2.Children 3.Fantasy 4.Comedy 5.Crime 6.Thriller 7.Action 8.Adventure 9.Sci-Fi 10.Drama 11.Western 12.War 13.Musical 14.Mystery 15.Romance 16.Documentary 17.Horror Figure 1. Top 5 movies under 10 different topics and their provided genre information are similar and user5 and user6 are alike. The subjective grouping result is exactly the same with that of a k-NN method to the matrix T heataU with the same hyper-parameters. Considering the semantic meanings of these 24 tags, we conclude that the result of the analysis on the user-tag matrix is persuasive. The analysis on the item-tag matrix is effective in a similar way. Yet, there are more merits for analyzing the item-tag matrix because we can obtain the names and genres of the movies from the overview data of the dataset. In the second example, we extract the tag data in Movielens dataset and get the movie-tag matrix. There are 7600 movies in this matrix. We make the tag analysis to find the latent topics again. For the sake of leaving more space for showing more important results, we set the desired topic number as 10. Fig.1 presents the five most probable movies per topic. According to it, we have several observations as follows: 1. There are at least 4 movies under the same genre for every topic. T opic1, T opic2 and T opic3 all have more than two such common genres. The high cooccurrence frequency of different movies under the same genre reflects the large extent of conformation. The coexistence of movie series like The Matrix and Star Wars under T opic7 illustrates our tag analysis can find not only the movies of the same genres but also the movies of the same series. 2. The five movies in T opic6 all reflect big social problems. This problem could be war, terrorist attack, or social security crisis. This explains why these movies under different genres are in the same topic. The topic finding result is not equal to the genre classification. The topic ”social problem” may be interesting for some of the users. These details are more valuable for inferring the users’ preference than genres. 3. According to the corresponding rating data, the average variance of ratings of the five movies under their
corresponding topic is just 0.102. It illustrates that old similar preferences for the movies with sim Table 1. RMSE rison with other approaches(A ilar probability under each topic. So we posit that smaller RMSE va a better perfo onsulting the movies with similar probability under each topic can help improve personalized rating pre- entage 207 8079 diction L:44■130271127510762 16921.11871.05656 0173 Metrics We use the Root Mean Square Error(RMSE)metrics Nghbru LAvg 0.88111 0.87990.880710.8803 to measure the prediction quality of our proposed ap- NgU.S020.87920.879008788 proach in comparison with other collaborative methods g0.88020.87980.879108789 RMSE is defined as: 0.8796 Nghbr vg08669 0.8665 RMSE=点-)2 Wgt0.86610.86580.86570.8655 The results in Table 1 show our neighborhood recom- mendation method outperforms the improved regular where rii and fii are the actual rating and predicted rat ized SVD method more than 41%. NMF 36%. and PMF ing from n users for m movies. plus, we use rounde 23%. We would like to analyze the results more specifi value as our predicted rating. The errors of predic cally: 1) For all these algorithms in Table 1, the predic tion with rounded rating value are more obvious. But tion accuracy increases as the training set's percentage whether the rating is rounded or unrounded, the com- ascends. This is reasonable because with high training parison result between different approaches does not data percentage, our algorithms find more neighbors to change a lo consult. The more neighbors we find, the more accu- racy we get. 2)Among our own several methods, the version Nghbra-Wgt presents the best performance. It Compariso illustrates that utilizing all the tag information and as- We compare our approach with two collaborative filter- signing different weights to thisthis tag information is ing algorithms: Non-negative Matrix Factorization(NMF) meaningful. 3)We also observe that item tag analy method, PMF method and the improved regularized SVD method. In fact, one of the most difficult prob- Although the difference is subtle, it explains the fact lems in our work is to find some coordinate algorithms that the item-based collaborative filtering approaches for comparison. Because our intention is to provide a e more popular than the user-based ones in early generalized item recommendation model to combine the works. 4)Besides, we find the performance increase of use of ratings and tags, most of the related work is in- hbra is obvious compared with Nghbru and n ghori applicable to the data resource in this situation. We This illustrates that the fusion of the user tag analysis choose three of the most popular algorithms by expedi- d the item tag analysis is lossless. 5 )Nevertheless the performance of the weighted version of Nghbra Nghbru and Nghbri is not much better than their av- The parameters of these two method also need to be erage counterparts. This can be explained by the ho- tuned. According to the relative works and our ex mogeneity of users. There are no authorities to give the periments,the best parameters for the PMF approach overwhelmingly important rating. The phenomenon re- on Movielens dataset are like these: Au =0.001, Ay fects the democracy in the online social netw 0.0001. Concerning the improved regularized SVD method Irate =0.001, A=0.02, A2=0.05. We set the number of feature dimensions as 80. We think this assignment is Parameter Analysis reasonable because the commonly used feature dimen- For topic finding, we set the Dirichlet priors a and B to sion for these matrix factorization is between 30 and 0/K and 0.1, respectively(K is the number of topics hese two hyper-parameters are the empirical values for LDA. The threshold value of processing probabilis- We have six versions of the improved collaborative fil- tic matrices e and e is set as 0.03 which means sta- tering methods. Nghbru represents the neighborhood tistically impossible. The other two parameters, itera ecommendation method based only on the user-tag tion number and topic number are unfixed. We explor analysis. N ghori corresponds to the variant based only the optimal solutions for them. Because the parame- on the item-tag analysis. Nghbra integrates the use of ters of topic finding are different regarding the objects he user-tag analysis and the item-tag analysis. Each of to analyze, we separate the process of tag analysis into these three methods, there are two different weighting user-tag analysis and item-tag analysis. We observe a strategies. One is to use uniform weights, labeled as shape RMSe increases with huge vibrations after 340 Avg the other is to use different weights, labeled as iterations for user-tag analysis. This can be seen as the signal of overfitting. Regarding the item-tag analysis
corresponding topic is just 0.102. It illustrates that users hold similar preferences for the movies with similar probability under each topic. So we posit that consulting the movies with similar probability under each topic can help improve personalized rating prediction. Metrics We use the Root Mean Square Error (RMSE) metrics to measure the prediction quality of our proposed approach in comparison with other collaborative methods. RMSE is defined as: RMSE = P N i=1 P M j=1 (rij − rˆij ) 2 I R ij P N i=1 P M j=1 I R ij , (5) where rij and ˆrij are the actual rating and predicted rating from N users for M movies. Plus, we use rounded value as our predicted rating. The errors of prediction with rounded rating value are more obvious. But whether the rating is rounded or unrounded, the comparison result between different approaches does not change a lot. Comparison We compare our approach with two collaborative filtering algorithms:Non-negative Matrix Factorization (NMF) method, PMF method and the improved regularized SVD method. In fact, one of the most difficult problems in our work is to find some coordinate algorithms for comparison. Because our intention is to provide a generalized item recommendation model to combine the use of ratings and tags, most of the related work is inapplicable to the data resource in this situation. We choose three of the most popular algorithms by expediency. The parameters of these two method also need to be tuned. According to the relative works and our experiments, the best parameters for the PMF approach on Movielens dataset are like these: λu = 0.001, λv = 0.0001. Concerning the improved regularized SVD method, lrate = 0.001, λ = 0.02, λ2 = 0.05. We set the number of feature dimensions as 80. We think this assignment is reasonable because the commonly used feature dimension for these matrix factorization is between 30 and 100. We have six versions of the improved collaborative filtering methods. Nghbru represents the neighborhood recommendation method based only on the user-tag analysis. Nghbri corresponds to the variant based only on the item-tag analysis. Nghbra integrates the use of the user-tag analysis and the item-tag analysis. Each of these three methods, there are two different weighting strategies. One is to use uniform weights, labeled as “Avg”; the other is to use different weights, labeled as “Wgt”. Table 1. RMSE comparison with other approaches (A smaller RMSE value means a better performance) RMSE Percentage 20% 50% 80% 99% NMF 1.4854 1.3027 1.1275 1.0762 irSVD 1.3176 1.2591 1.1928 1.1087 PMF 1.1692 1.1187 1.05656 1.0173 Nghbru Avg 0.8811 0.8799 0.8807 0.8803 Wgt 0.8802 0.8792 0.8796 0.8788 Nghbri Avg 0.8802 0.8798 0.8791 0.8789 Wgt 0.8802 0.8796 0.8790 0.8788 Nghbra Avg 0.8669 0.8668 0.8665 0.8662 Wgt 0.8661 0.8658 0.8657 0.8655 The results in Table 1 show our neighborhood recommendation method outperforms the improved regularized SVD method more than 41% , NMF 36%, and PMF 23%. We would like to analyze the results more specifi- cally: 1)For all these algorithms in Table 1, the prediction accuracy increases as the training set’s percentage ascends. This is reasonable because with high training data percentage, our algorithms find more neighbors to consult. The more neighbors we find, the more accuracy we get. 2)Among our own several methods, the version Nghbra-Wgt presents the best performance. It illustrates that utilizing all the tag information and assigning different weights to this this tag information is meaningful. 3)We also observe that item tag analysis is a little more effective than the user tag analysis. Although the difference is subtle, it explains the fact that the item-based collaborative filtering approaches are more popular than the user-based ones in early works. 4)Besides, we find the performance increase of Nghbra is obvious compared with Nghbru and Nghbri. This illustrates that the fusion of the user tag analysis and the item tag analysis is lossless. 5)Nevertheless, the performance of the weighted version of Nghbra, Nghbru and Nghbri is not much better than their average counterparts. This can be explained by the homogeneity of users. There are no authorities to give the overwhelmingly important rating. The phenomenon re- flects the democracy in the online social network. Parameter Analysis For topic finding, we set the Dirichlet priors α and β to 50/K and 0.1, respectively (K is the number of topics). These two hyper-parameters are the empirical values for LDA. The threshold value of processing probabilistic matrices ΘU and ΘI is set as 0.03 which means statistically impossible. The other two parameters, iteration number and topic number are unfixed. We explore the optimal solutions for them. Because the parameters of topic finding are different regarding the objects to analyze, we separate the process of tag analysis into user-tag analysis and item-tag analysis. We observe a shape RMSE increases with huge vibrations after 340 iterations for user-tag analysis. This can be seen as the signal of overfitting. Regarding the item-tag analysis
we observe the optimal iteration number is 480. So we the data sparsity. Rating data sparsity is always exis- think the optimal iteration number for them is 340 and tent and thus there are possibilities that our algorithm 480, respectively. And we use these two parameters in cannot find enough consultants for potential rating in- ference when the topic number is set relatively large One probable case is like this: the neighbors found by 20 Percent as Training 96 50 Percent as Training set dict, but the user has not given a rating to it sis, the optimal topic number is 23 as figure 3 illustrates. The situation here is similar 小r2 with that in item-tag analysis. The local minimum is not the global one. The reason for this is the same as mentioned before. What is different is the stable dura- tion here is shorter than that in item-tag analysis and 9 Percent as Training Set the fluctuation here is more obvious. This observation can be explained by the diversity of personal interests Compared with movies, the attributes of human beings 000 are more dynamic and diverse. It is easier to find sim- ilar items than similar people because the measure in the latter situation Is vaguer. n5005 In summary, the optimal topic number is around 25 in 00102 Number of Top both two situations. which means the results are con- sistent. And the genre number in common use is the Fi igure 2. the topic number and results are reasonable. But we must emphasize the fact the prediction accuracy for the item analysis that our method of topic finding and the common use of genre classification focus on different targets and thus 20 Percent as Training Set 50 Percent as Tr produce different results. Considering the fact that the 0,9 information of genre demands the knowledge from ex- perts, our method of topic finding has wider range of Discussion There are some issues concerning implemental details we need to explain her. Because tags are given with high freedom, there are a lot of preprocessing work to 0.910 do. First, there are many noise such as" D"in the data We absolutely should remove them all. But in fact, we are unable to guarantee all noise are cleared off because they lack rules. Second, stoplist is one of the most im- portant parts to manipulate. To our best knowledge most stoplists used in document clustering remove the word“good”and“ great”. Concerning Number of Top Number of Topics these words reflect the users' preference to the items words because it may benefit rating prediction. In our Dependence between the topic number and tion accuracy for the users'tag analysis experiments, we remove the prepositions, conjunctions and other less meaningful words while leave emotional Compared with the iteration number, we are more in- words untouched. Third, stemming is also a compli- terested in the dependence between the topic number cated technique that we must employ. It may be sim- and the prediction accuracy. We record the effects the prediction preciseness for the topic numbers rang he bags of tags in our experiment are rather chaotic We are worried the stemming algorithm may to some optimal topic number for the items' tag analysis is less extent have a negative influence on the quality of topic the global optimal value. Although just local optimum, should further improve the quality of recommendation the rmse value near the topic number of 25 is stably o some extent ess than 0.9. The observation that there are some bet ter parameter choices less than 10 can be explained by CONCLUSIONS AND FUTURE WORK
we observe the optimal iteration number is 480. So we think the optimal iteration number for them is 340 and 480, respectively. And we use these two parameters in the above experiments. Figure 2. Dependence between the topic number and the prediction accuracy for the items’ tag analysis Figure 3. Dependence between the topic number and the prediction accuracy for the users’ tag analysis Compared with the iteration number, we are more interested in the dependence between the topic number and the prediction accuracy. We record the effects to the prediction preciseness for the topic numbers ranging between (1, 50). From Figure 2 we observe that the optimal topic number for the items’ tag analysis is less than 25. The optimal value does not necessarily mean the global optimal value. Although just local optimum, the RMSE value near the topic number of 25 is stably less than 0.9. The observation that there are some better parameter choices less than 10 can be explained by the data sparsity. Rating data sparsity is always existent and thus there are possibilities that our algorithm cannot find enough consultants for potential rating inference when the topic number is set relatively large. One probable case is like this: the neighbors found by our approach is very similar to the movie we are to predict, but the user has not given a rating to it. For user-tag analysis, the optimal topic number is 23 as Figure 3 illustrates. The situation here is similar with that in item-tag analysis. The local minimum is not the global one. The reason for this is the same as mentioned before. What is different is the stable duration here is shorter than that in item-tag analysis and the fluctuation here is more obvious. This observation can be explained by the diversity of personal interests. Compared with movies, the attributes of human beings are more dynamic and diverse. It is easier to find similar items than similar people because the measure in the latter situation is vaguer. In summary, the optimal topic number is around 25 in both two situations, which means the results are consistent. And the genre number in common use is the same order of magnitude. From this perspective, our results are reasonable. But we must emphasize the fact that our method of topic finding and the common use of genre classification focus on different targets and thus produce different results. Considering the fact that the information of genre demands the knowledge from experts, our method of topic finding has wider range of application. Discussion There are some issues concerning implemental details we need to explain her. Because tags are given with high freedom, there are a lot of preprocessing work to do. First, there are many noise such as “:D” in the data. We absolutely should remove them all. But in fact, we are unable to guarantee all noise are cleared off because they lack rules. Second, stoplist is one of the most important parts to manipulate. To our best knowledge, most stoplists used in document clustering remove the word “good” and “great”. Concerning our approach, these words reflect the users’ preference to the items. It is somehow meaningful. We hesitate to remove these words because it may benefit rating prediction. In our experiments, we remove the prepositions, conjunctions and other less meaningful words while leave emotional words untouched. Third, stemming is also a complicated technique that we must employ. It may be simpler for ordinary document and webpage retrieval. But the bags of tags in our experiment are rather chaotic. We are worried the stemming algorithm may to some extent have a negative influence on the quality of topic finding. If we take better care of these three factors, we should further improve the quality of recommendation to some extent. CONCLUSIONS AND FUTURE WORK
In this paper we proposed a novel method to allevi- ACM SIGIR conference on Research ate the sparsity problem and improve the quality of the development in informaion retrieval, collaborative filtering method. We make use of the tag 259-266. New York. NY USA. 2003 information to find closer neighbors for the users and the items, respectively. These neighbors give stron 7. T Hofmann. Latent semantic models for inferences for the potential preference of the user for collaborative filtering. ACM Trans. Inf. Syst the item, We utilize these inferences to make the rat- 22(1):89-115,2004. ing prediction. According to the experiments, our ap 8. Y. Koren. Factorization meets the neighborhood proach's prediction for the users' preference is much more accurate than the art-of-state ones such as nmf a multifaceted collaborative filtering model. In method, PMF method and the improved regularized KDD, pages426-434,2008 SVD method 9. W. Li and D. Yeung. Relation regularized matrix factorization. In Proceedings of the 21 st Finding neighbors is vital for an excellent collabora International Joint Conference on Artificial tive filtering algorithm. The motivation of our work Intellig 2009 is to find better neighbors, which give stronger infer ence for the prediction. Latent topics connect the users .O. H. Ma, H. Yang, M.R. Lyu, and I. King. Sorec: and items with similar interests together. Finding these Social recommendation sing probabilistic matrix topics is equal to finding the neighbors. The connection factorization In CIKM, pages 931-940, 2008 that cannot be discovered in the rating records can be disclosed through learning about the tagging histor 11. A. Marchetti. M. Tesconi. F. Ronzano. M. rosella This is why our method outperforms the others and S. Minutoli. SemKey: A semantic collaborative tagging system. In Proceedings of The next step for us is to improve our method in two 16th International World wide web Conference, vww2007 Citeseer 2007 aspects. One is incrementalization. The neighborho nethod enjoys the low computational complexity but 12. B. Marlin. Collaborative filtering: A machine suffers from the rigidity to frequent update. If there are learning perspective. Master's thesis, University of lot of new entries from the users to items our current Toronto. 2004 solution fail to deal with this situation on the other an fuse the collaborative matrix factorization 13. P. Massa and p avesani. Trust-aware nethod with our topic finding model. This is another recommender systems. In RecSys 07: Proceedings way to make use of the latent topic information. We of the 2007 ACM conference on Recommender believe these methods are both promising solutions to systems, pages 17-24, New York, NY, USA, 2007. further improve the collaborative filtering technique 14. B J. Mirza, B. J. Keller, and N. Ramakrishnan. ADDITIONAL AUTHORS Studying recommendation algorithms by graph REFERENCES analysis. Journal of Intelligent Information 1. R. Bell and Y. Koren. Improved Systems,20(2):131-160, March2003 neighborhood-based collaborative filtering. In KDD-Cup and Workshop. Citeseer, 2007 15. N. S. Nati and T Jaakkola. Weighted low-rank approximations. InIn 20th International 2. R. M. Bell and Y. Koren. Scalable collaborative Conference on Machine Learning, pages 720-727 filtering with jointly derived neighborhood aaai Press. 2003 terpolation weights In ICDM, pages 43-52 16. R. Salakhutdinov and A. mnih. Probabilistic matrix factorization. In Advances in Neural 3. D. M. Blei, A. Y. Ng, and M. I Jordan. Latent Information Processing Systems, 2007 dirichlet allocation. Journal of Machine Learning Research,3:993-1022,2003 17. Connecting users to items through tags. In ww S. Sen,J. Vig, and J. Riedl. Tagommenders 4. D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. t sing collaborative filtering to weave an formation tapestry. Commun. ACM, 18. B. Sigurbj 35(12):61-70,1992 ornsson and R. Van Zwol. Flickr tag 5. D. M. Hawkins, S. C. Basak, and D. Mills Assessing model fit by cross-validation. J. Chem Inf. Comput. Sci., 43(2): 579-586, March 2003 19. A. Singh and G. Gordon. Relational learning via collective matrix factorization In Proceeding of the 6. T. Hofmann. Collaborative filtering via gaussian 4th ACM SIGKDD international conference on probabilistic latent semantic analysis. In SIGIR Knowledge discovery and data 03: Proceedings of the 26th annual international 650-658.ACM,2008
In this paper we proposed a novel method to alleviate the sparsity problem and improve the quality of the collaborative filtering method. We make use of the tag information to find closer neighbors for the users and the items, respectively. These neighbors give strong inferences for the potential preference of the user for the item. We utilize these inferences to make the rating prediction. According to the experiments, our approach’s prediction for the users’ preference is much more accurate than the art-of-state ones such as NMF method, PMF method and the improved regularized SVD method. Finding neighbors is vital for an excellent collaborative filtering algorithm. The motivation of our work is to find better neighbors, which give stronger inference for the prediction. Latent topics connect the users and items with similar interests together. Finding these topics is equal to finding the neighbors. The connection that cannot be discovered in the rating records can be disclosed through learning about the tagging history. This is why our method outperforms the others. The next step for us is to improve our method in two aspects. One is incrementalization. The neighborhood method enjoys the low computational complexity, but suffers from the rigidity to frequent update. If there are a lot of new entries from the users to items, our current solution fail to deal with this situation. On the other hand, we can fuse the collaborative matrix factorization method with our topic finding model. This is another way to make use of the latent topic information. We believe these methods are both promising solutions to further improve the collaborative filtering technique. ADDITIONAL AUTHORS REFERENCES 1. R. Bell and Y. Koren. Improved neighborhood-based collaborative filtering. In KDD-Cup and Workshop. Citeseer, 2007. 2. R. M. Bell and Y. Koren. Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In ICDM, pages 43–52, 2007. 3. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. 4. D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Commun. ACM, 35(12):61–70, 1992. 5. D. M. Hawkins, S. C. Basak, and D. Mills. Assessing model fit by cross-validation. J. Chem. Inf. Comput. Sci., 43(2):579–586, March 2003. 6. T. Hofmann. Collaborative filtering via gaussian probabilistic latent semantic analysis. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 259–266, New York, NY, USA, 2003. ACM. 7. T. Hofmann. Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst., 22(1):89–115, 2004. 8. Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In KDD, pages 426–434, 2008. 9. W. Li and D. Yeung. Relation regularized matrix factorization. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, 2009. 10. H. Ma, H. Yang, M. R. Lyu, and I. King. Sorec: Social recommendation using probabilistic matrix factorization. In CIKM, pages 931–940, 2008. 11. A. Marchetti, M. Tesconi, F. Ronzano, M. Rosella, and S. Minutoli. SemKey: A semantic collaborative tagging system. In Proceedings of 16th International World Wide Web Conference, WWW2007. Citeseer, 2007. 12. B. Marlin. Collaborative filtering: A machine learning perspective. Master’s thesis, University of Toronto, 2004. 13. P. Massa and P. Avesani. Trust-aware recommender systems. In RecSys ’07: Proceedings of the 2007 ACM conference on Recommender systems, pages 17–24, New York, NY, USA, 2007. ACM. 14. B. J. Mirza, B. J. Keller, and N. Ramakrishnan. Studying recommendation algorithms by graph analysis. Journal of Intelligent Information Systems, 20(2):131–160, March 2003. 15. N. S. Nati and T. Jaakkola. Weighted low-rank approximations. In In 20th International Conference on Machine Learning, pages 720–727. AAAI Press, 2003. 16. R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In Advances in Neural Information Processing Systems, 2007. 17. S. Sen, J. Vig, and J. Riedl. Tagommenders: Connecting users to items through tags. In WWW, 2009. 18. B. Sigurbj ”ornsson and R. Van Zwol. Flickr tag recommendation based on collective knowledge. 2008. 19. A. Singh and G. Gordon. Relational learning via collective matrix factorization. In Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 650–658. ACM, 2008
20. Y. Song, Z. Zhuang, H. Li, Q. Zhao, J. Li, W. Lee, and C. Giles. Real-time automatic tag recommendation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 515-522.ACM.2008 21. N. Srebro. Learning with matrin factorizations PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2004 22. F Suchanek, M. Vojnovic, and D. Gunawardena Social tags: meaning and suggestions. 2008 3. Wang. A. P. de Vries and M.. T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion In SIGIR 6: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 501-508, New York, NY, USA, 2006. ACM Press 24. S. Wasserman, K. Faust, and D. iacobucci. Social Network anal Methods and Applicatio (Structural Analysis in the Social Sciences) Cambridge University Press, November 1994 25. X. Wu, L. Zhang, and Y. Yu. Exploring social of the 15th international conference on Word Ya. tations for the semantic web. In Proceeding Wide web, page 426. ACM, 2006 26. S. Zhang. w. Wang, J. Ford, and F. Makedon. Learning from incomplete ratings using non-negative matrix factorization. In SDM, 2006 ging motivations for tagging expression, performance, and activism. www 2007.2007
20. Y. Song, Z. Zhuang, H. Li, Q. Zhao, J. Li, W. Lee, and C. Giles. Real-time automatic tag recommendation. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 515–522. ACM, 2008. 21. N. Srebro. Learning with matrix factorizations. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2004. 22. F. Suchanek, M. Vojnovic, and D. Gunawardena. Social tags: meaning and suggestions. 2008. 23. J. Wang, A. P. de Vries, and M. J. T. Reinders. Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 501–508, New York, NY, USA, 2006. ACM Press. 24. S. Wasserman, K. Faust, and D. Iacobucci. Social Network Analysis : Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, November 1994. 25. X. Wu, L. Zhang, and Y. Yu. Exploring social annotations for the semantic web. In Proceedings of the 15th international conference on World Wide Web, page 426. ACM, 2006. 26. S. Zhang, W. Wang, J. Ford, and F. Makedon. Learning from incomplete ratings using non-negative matrix factorization. In SDM, 2006. 27. A. Zollers. Emerging motivations for tagging: expression, performance, and activism. WWW 2007, 2007