TagiCoFi:Tag Informed Collaborative Filtering Yi Zhen Wu-Jun Li Dit-Yan Yeung Department of Computer Department of Computer Department of Computer Science and Engineering Science and Engineering Science and Engineering Hong Kong University of Hong Kong University of Hong Kong University of Science and Technology Science and Technology Science and Technology Hong Kong,China Hong Kong,China Hong Kong,China yzhen@cse.ust.hk liwujun@cse.ust.hk dyyeung@cse.ust.hk ABSTRACT available.Some representative examples include product Besides the rating information,an increasing number of mod- recommendation in Amazon.com 14,movie recommenda- ern recommender systems also allow the users to add per- tion in Netflix [3]and MovieLens'[16],reference recommen- sonalized tags to the items.Such tagging information may dation in CiteULike2,and bookmark recommendation in provide very useful information for item recommendation, Del.icio.us3.Existing recommender systems can be roughly because the users'interests in items can be implicitly re divided into two major categories 1.Content-based sys- flected by the tags that they often use.Although some tems 2,12,15 make use of profiles of the users or products content-based recommender systems have made preliminary to characterize their nature.On the other hand,systems attempts recently to utilize tagging information to improve based on collaborative filtering (CF)[4,9,16,17,19 do not the recommendation performance,few recommender systems exploit explicit user profiles but only past activities of the based on collaborative filtering (CF)have employed tagging users,such as their transaction history or product satisfac- information to help the item recommendation procedure. tion expressed in ratings,to predict the future activities of In this paper,we propose a novel framework,called tag the users.In recent years.CF-based systems have become informed collaborative filtering (TagiCoFi),to seamlessly in- more and more popular than content-based systems because tegrate tagging information into the CF procedure.Experi- it is much easier to collect the past activities of users than mental results demonstrate that TagiCoFi outperforms its their profiles due to privacy considerations. counterpart which discards the tagging information even In recent years,besides the ratings on the items given when it is available,and achieves state-of-the-art perfor- by the users.an increasing number of modern recommender systems also allow the users to add personalized tags4,in mance. the form of words or phrases,to the items.For example, users may add tags to movies in MovieLens,to web sites in Categories and Subject Descriptors Del.icio.us and to references in CiteULike.Such tagging in- H.3 Information Storage and Retrieval):Information formation may provide very useful information for item rec- Search and Retrieval-Information Filtering:H.2 [Database ommendation,because the users'interests in items can be Management:Database Application-Data Mining implicitly reflected by the tags that they often use 21.For example,if two users often use the tags "Oscar"and "Tom General Terms Hanks”,both of them may like the movie“Forrest Gump'” In fact,the effectiveness of tags in representing users'prefer- Algorithms ence or interests has been validated by Zanardi et al.in the CiteULike dataset [27.Very recently,some content-based Keywords systems,such as those in 6,22,23,have made some pre- Collaborative filtering,recommender systems,tag liminary attempts to utilize tagging information to improve the recommendation performance.However,there has been little work on improving CF-based systems with the help of 1.INTRODUCTION tagging information.Because CF-based systems have be- Since the amount of information on the Web is increasing come more popular than content-based systems,it would be at an astonishing rate that is much faster than our ability a very worthwhile endeavor to devise novel CF techniques to process it,recommendation plays a more and more im- which can also utilize tagging information for item recom- portant role for us to make effective use of the information mendation Existing CF methods can be divided into two main cat- http://movielens.umn.edu/ Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are http://www.citeulike.org/ not made or distributed for profit or commercial advantage and that copies http://delicious.com/ bear this notice and the full citation on the first page.To copy otherwise,to AIt should be emphasized that the setting in this paper is republish,to post on servers or to redistribute to lists,requires prior specific different from those about tag recommendation 7,24 in permission and or a tee. which the recommended objects are tags.The recommended RecSys'09.October 23-25.2009.New York.New York.USA. objects in this paper are called items,whereas tags are other Copyright2009ACM978-1-60558-435-5/09/10.$10.00. objects about the items added by users
TagiCoFi: Tag Informed Collaborative Filtering Yi Zhen Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China yzhen@cse.ust.hk Wu-Jun Li Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China liwujun@cse.ust.hk Dit-Yan Yeung Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China dyyeung@cse.ust.hk ABSTRACT Besides the rating information, an increasing number of modern recommender systems also allow the users to add personalized tags to the items. Such tagging information may provide very useful information for item recommendation, because the users’ interests in items can be implicitly re- flected by the tags that they often use. Although some content-based recommender systems have made preliminary attempts recently to utilize tagging information to improve the recommendation performance, few recommender systems based on collaborative filtering (CF) have employed tagging information to help the item recommendation procedure. In this paper, we propose a novel framework, called tag informed collaborative filtering (TagiCoFi), to seamlessly integrate tagging information into the CF procedure. Experimental results demonstrate that TagiCoFi outperforms its counterpart which discards the tagging information even when it is available, and achieves state-of-the-art performance. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: Information Search and Retrieval—Information Filtering; H.2 [Database Management]: Database Application—Data Mining General Terms Algorithms Keywords Collaborative filtering, recommender systems, tag 1. INTRODUCTION Since the amount of information on the Web is increasing at an astonishing rate that is much faster than our ability to process it, recommendation plays a more and more important role for us to make effective use of the information Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. RecSys’09, October 23–25, 2009, New York, New York, USA. Copyright 2009 ACM 978-1-60558-435-5/09/10 ...$10.00. available. Some representative examples include product recommendation in Amazon.com [14], movie recommendation in Netflix [3] and MovieLens1 [16], reference recommendation in CiteULike2 , and bookmark recommendation in Del.icio.us3 . Existing recommender systems can be roughly divided into two major categories [1]. Content-based systems [2, 12, 15] make use of profiles of the users or products to characterize their nature. On the other hand, systems based on collaborative filtering (CF) [4, 9, 16, 17, 19] do not exploit explicit user profiles but only past activities of the users, such as their transaction history or product satisfaction expressed in ratings, to predict the future activities of the users. In recent years, CF-based systems have become more and more popular than content-based systems because it is much easier to collect the past activities of users than their profiles due to privacy considerations. In recent years, besides the ratings on the items given by the users, an increasing number of modern recommender systems also allow the users to add personalized tags4 , in the form of words or phrases, to the items. For example, users may add tags to movies in MovieLens, to web sites in Del.icio.us and to references in CiteULike. Such tagging information may provide very useful information for item recommendation, because the users’ interests in items can be implicitly reflected by the tags that they often use [21]. For example, if two users often use the tags “Oscar” and “Tom Hanks”, both of them may like the movie “Forrest Gump”. In fact, the effectiveness of tags in representing users’ preference or interests has been validated by Zanardi et al. in the CiteULike dataset [27]. Very recently, some content-based systems, such as those in [6, 22, 23], have made some preliminary attempts to utilize tagging information to improve the recommendation performance. However, there has been little work on improving CF-based systems with the help of tagging information. Because CF-based systems have become more popular than content-based systems, it would be a very worthwhile endeavor to devise novel CF techniques which can also utilize tagging information for item recommendation. Existing CF methods can be divided into two main cat- 1 http://movielens.umn.edu/ 2 http://www.citeulike.org/ 3 http://delicious.com/ 4 It should be emphasized that the setting in this paper is different from those about tag recommendation [7, 24] in which the recommended objects are tags. The recommended objects in this paper are called items, whereas tags are other objects about the items added by users
egories [10].Memory-based methods,such as [9,19],try U E RDxN and V E RDxM,where typically D N,M. to predict new ratings by (weighted)averaging the ratings and use R UTV to approximate the rating matrix R. of similar users or items.On the other hand.model-based The column vectors U.i and V.;represent the user-specific methods,such as probabilistic matrix factorization(PMF)[17 and item-specific latent feature vectors,respectively. try to learn a model from data using statistical learning tech- Let Z be the tagging matrix,and each of its elements Zik niques.To the best of our knowledge,there exists only one is the tf*idf value of user i and tag k [18,23]: CF method 25 which attempts to utilize tagging informa- tion to improve item recommendation.This method is a Zk=tf(i,R)×log2 memory-based one.The experimental results in 25 show df() (1) that little improvement could be achieved on item recom- mendation by integrating tagging information into the CF where tf(i,k)is the normalized frequency of tag k appeared procedure under the memory-based framework. in user i's tagging history and df(k)is the number of users In this paper,we propose a novel framework,called tag who have used tag k. informed collaborative filtering (TagiCoFi),to seamlessly in- tegrate tagging information into the model-based CF proce- 2.2 Probabilistic Matrix Factorization dure.More specifically,we use tagging information to regu- PMF [17]seeks to derive the aforementioned low-rank ma- larize the matrix factorization (MF)procedure of PMF [17] trices U and V by analyzing the rating matrix R in a prob- which has been demonstrated to be one of the state-of-the- abilistic framework.The likelihood of the observed ratings art CF methods.Some promising properties of TagiCoFi R is defined as follows: are highlighted here: w To the best of our knowledge,TagiCoFi is the first p(R U,V,o)= iWR,1uV,2)],2② 1 work that incorporates tagging information into a model- based CF system for item recommendation. where N(ru,o2)denotes the (univariate)Gaussian distri- TagiCoFi outperforms its counterpart,PMF,which bution with mean u and variance o discards the tagging information even when it is avail- Putting zero-mean spherical Gaussian priors on the user- able.This shows that the tagging information does specific and item-specific feature vectors: contain useful information for item recommendation and TagiCoFi can utilize it very effectively. p(Ul)=IN(U.:10,I) TagiCoFi can overcome,or at least alleviate,the over- fitting problem 17 suffered by most MF-based CF methods due to the sparsity of the rating matrix. p(Vlav)= ΠN(V|0,I, TagiCoFi can solve the cold-start problem 8,11,20 in that it can give recommendations to novel users who we can obtain the marimum a posteriori (MAP)estimates have no preference on any items. of U and V by minimizing the following objective function defined based on the sum of squared errors: The rest of this paper is organized as follows.In Section 2, N M we will introduce the notations and some preliminaries.Sec- tion 3 describes the details of our model.Experimental re- E= 2∑∑,(R,-UgVP =1j=1 sults are presented in Section 4 and,finally,we conclude the paper in Section 5. +(UTU)+t(VTV). 2 2 (3) 2.NOTATIONS AND PRELIMINARIES where Au =02/ai and Av =a2/av In this section,we first introduce some notations used in this paper.We then briefly review PMF [17 which is closely 3. TAG INFORMED COLLABORATIVE FIL- related to our work. TERING 2.1 Notations Because PMF [17]has achieved state-of-the-art perfor- We use boldface uppercase letters,such as A,to denote mance for CF tasks,we use it as the base model to make matrices,and boldface lowercase letters,such as b,to denote further enhancement by integrating tagging information in a vectors.The ith row and the jth column of a matrix A are principled way.The result is our tag informed collaborative denoted as Ai and A.,respectively.The (i,j)th element filtering method,which will be abbreviated as TagiCoFi in of A is denoted as Aij and the ith element of b as bi. the sequel.The key idea of TagiCoFi is to use tagging in- Suppose there are N users,M items and K tags.Let R be formation to regularize the MF procedure of PMF.More the rating matrix in which Ri;represents the rating of user i specifically,we seek to make two user-specific latent feature for item j.The matrix R is sparse because many elements vectors as similar as possible if the two users have similar are missing,and each such element Rij is assigned the value tagging history. of 0 to indicate that item j has not been rated by user i. In the rest of this section,we first introduce some met- Y is the indicator matrix where Yij is an indicator variable rics for characterizing the similarity between users based on which is equal to 1 if user i rated item j and 0 otherwise tagging information.We then propose our TagiCoFi model MF-based methods 17 seek to find two low-rank matrices based on the computed user similarities
egories [10]. Memory-based methods, such as [9, 19], try to predict new ratings by (weighted) averaging the ratings of similar users or items. On the other hand, model-based methods, such as probabilistic matrix factorization (PMF) [17], try to learn a model from data using statistical learning techniques. To the best of our knowledge, there exists only one CF method [25] which attempts to utilize tagging information to improve item recommendation. This method is a memory-based one. The experimental results in [25] show that little improvement could be achieved on item recommendation by integrating tagging information into the CF procedure under the memory-based framework. In this paper, we propose a novel framework, called tag informed collaborative filtering (TagiCoFi), to seamlessly integrate tagging information into the model-based CF procedure. More specifically, we use tagging information to regularize the matrix factorization (MF) procedure of PMF [17] which has been demonstrated to be one of the state-of-theart CF methods. Some promising properties of TagiCoFi are highlighted here: • To the best of our knowledge, TagiCoFi is the first work that incorporates tagging information into a modelbased CF system for item recommendation. • TagiCoFi outperforms its counterpart, PMF, which discards the tagging information even when it is available. This shows that the tagging information does contain useful information for item recommendation and TagiCoFi can utilize it very effectively. • TagiCoFi can overcome, or at least alleviate, the over- fitting problem [17] suffered by most MF-based CF methods due to the sparsity of the rating matrix. • TagiCoFi can solve the cold-start problem [8, 11, 20] in that it can give recommendations to novel users who have no preference on any items. The rest of this paper is organized as follows. In Section 2, we will introduce the notations and some preliminaries. Section 3 describes the details of our model. Experimental results are presented in Section 4 and, finally, we conclude the paper in Section 5. 2. NOTATIONS AND PRELIMINARIES In this section, we first introduce some notations used in this paper. We then briefly review PMF [17] which is closely related to our work. 2.1 Notations We use boldface uppercase letters, such as A, to denote matrices, and boldface lowercase letters, such as b, to denote vectors. The ith row and the jth column of a matrix A are denoted as Ai∗ and A∗j , respectively. The (i, j)th element of A is denoted as Aij and the ith element of b as bi. Suppose there are N users, M items and K tags. Let R be the rating matrix in which Rij represents the rating of user i for item j. The matrix R is sparse because many elements are missing, and each such element Rij is assigned the value of 0 to indicate that item j has not been rated by user i. Y is the indicator matrix where Yij is an indicator variable which is equal to 1 if user i rated item j and 0 otherwise. MF-based methods [17] seek to find two low-rank matrices U ∈ R D×N and V ∈ R D×M, where typically D N, M, and use Rˆ = UT V to approximate the rating matrix R. The column vectors U∗i and V∗j represent the user-specific and item-specific latent feature vectors, respectively. Let Z be the tagging matrix, and each of its elements Zik is the tf*idf value of user i and tag k [18, 23]: Zik = tf(i, k) × log2 „ N df(k) « , (1) where tf(i, k) is the normalized frequency of tag k appeared in user i’s tagging history and df(k) is the number of users who have used tag k. 2.2 Probabilistic Matrix Factorization PMF [17] seeks to derive the aforementioned low-rank matrices U and V by analyzing the rating matrix R in a probabilistic framework. The likelihood of the observed ratings R is defined as follows: p(R | U, V, σ 2 ) = YN i=1 YM j=1 h N (Rij | U T ∗iV∗j , σ 2 ) iYij , (2) where N (x | µ, σ2 ) denotes the (univariate) Gaussian distribution with mean µ and variance σ 2 . Putting zero-mean spherical Gaussian priors on the userspecific and item-specific feature vectors: p(U | σ 2 U ) = YN i=1 N (U∗i | 0, σ 2 U I) p(V | σ 2 V ) = YM j=1 N (V∗j | 0, σ 2 V I), we can obtain the maximum a posteriori (MAP) estimates of U and V by minimizing the following objective function defined based on the sum of squared errors: E = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + λU 2 tr(U T U) + λV 2 tr(V T V), (3) where λU = σ 2 /σ2 U and λV = σ 2 /σ2 V . 3. TAG INFORMED COLLABORATIVE FILTERING Because PMF [17] has achieved state-of-the-art performance for CF tasks, we use it as the base model to make further enhancement by integrating tagging information in a principled way. The result is our tag informed collaborative filtering method, which will be abbreviated as TagiCoFi in the sequel. The key idea of TagiCoFi is to use tagging information to regularize the MF procedure of PMF. More specifically, we seek to make two user-specific latent feature vectors as similar as possible if the two users have similar tagging history. In the rest of this section, we first introduce some metrics for characterizing the similarity between users based on tagging information. We then propose our TagiCoFi model based on the computed user similarities
3.1 Tag-based User Similarity Measures where Sj is the tag-based similarity between user i and We introduce several possible measures for characterizing user j computed based on one of the measures defined in user similarities based on the tagging matrix Z.Here,Tij Section 3.1,C =D-S is known as the Laplacian matrix [5] denotes the index set of tags which are used by both user i with D being a diagonal matrix whose diagonal elements and user j. D=;Si,and tr()denotes the trace of a matrix. To integrate tagging information into the CF procedure 3.1.I Cosine Similarity TagiCoFi combines the criteria (9)and (10)to give the fol- The cosine similarity is defined as follows: lowing objective function for minimization: ∑kETy ZikZjk (4) √∑krZ疏V∑kerZ保 (n-u v) 3.1.2 Pearson Similarity )+(VV)] + The Pearson correlation coefficient between two users is c]. (11) defined as follows: where B is an additional regularization parameter to control p1(i,j)= ∑ker(Zk-Z)(Zk-Z) (5) the contribution from the tagging information. V∑keTy(Zk-Z)PV∑ker(Zk-Z)2 The formulation in (11)can be seen as an adaptation of relation reqularized matrir factorization(RRMF)[13]which where Zi= The Pearson similarity is then models relational data containing both relation information and content information.The main difference between Tagi- defined as: CoFi and RRMF is that TagiCoFi can handle missing data. s=1+exp(-p6,刃 (6) which is one of the key characteristics of CF. 3.1.3 Euclidean-based Similarity 3.3 Learning The objective function in (11)can be rewritten as follows: The Euclidean distance between two users is defined as follows: NM p2(i,)= (Zik-Zjk)2. (7) =1=1 The Euclidean-based similarity is then defined as: +U(aI+Bc)UT] S号e=exp [P(色,]2 +vv]. (12) 2σ2 (8) where I is the identity matrix.We use an alternating gra- where o is a user-controlled parameter. dient descent procedure to optimize (12).More specifically, each time we fix one variable (U or V)and minimize the 3.2 Model Formulation of TagiCoFi objective function with respect to the other one (V or U). Like in PMF [17].we adopt a similar MF procedure to find This procedure is repeated for several iterations until some U and V by minimizing the following criterion function: termination condition is satisfied. To learn U,we first rewrite (12)as follows: N A ∑2,(R-uvP+ruU+vv], f=g+h+C. (13) =1j=1 where C is a constant independent of U,and where a is a regularization parameter for complexity control. Furthermore,TagiCoFi employs the user similarities de- N M fined based on the tagging information to regularize the MF 9=∑∑,(B-UgVg procedure,with the goal to make the user-specific latent fea- i=1j=1 ture vectors as similar as possible if the corresponding users D have similar tagging history.We can achieve this goal by h=号∑Ua.(al+Bc)ui minimizing the following criterion function: 2 (14) d=1 NN From (14),we can see that the rows of U in h are decou- pled.Hence,we apply gradient descent to optimize one row =1=1 of U at a time with the other rows fixed. Because d=1 =(∑)a- D 1=1 =∑Ua.cUa >YijVa(Ris -Un:V.;+Ua:Vaj),(15) =tr(UCU), (10) j=1
3.1 Tag-based User Similarity Measures We introduce several possible measures for characterizing user similarities based on the tagging matrix Z. Here, T ij denotes the index set of tags which are used by both user i and user j. 3.1.1 Cosine Similarity The cosine similarity is defined as follows: S cos ij = P k∈T ij ZikZjk qP k∈T ij Z2 ikqP k∈T ij Z2 jk . (4) 3.1.2 Pearson Similarity The Pearson correlation coefficient between two users is defined as follows: ρ1(i, j) = P k∈T ij (Zik − Z¯i)(Zjk − Z¯j ) qP k∈T ij (Zik − Z¯i) 2 qP k∈T ij (Zjk − Z¯j ) 2 , (5) where Z¯i = P k∈T ij Zik |T ij | . The Pearson similarity is then defined as: S pea ij = 1 1 + exp(−ρ1(i, j)). (6) 3.1.3 Euclidean-based Similarity The Euclidean distance between two users is defined as follows: ρ2(i, j) = s X k∈T ij (Zik − Zjk) 2. (7) The Euclidean-based similarity is then defined as: S euc ij = exp „ − [ρ2(i, j)]2 2σ2 « , (8) where σ is a user-controlled parameter. 3.2 Model Formulation of TagiCoFi Like in PMF [17], we adopt a similar MF procedure to find U and V by minimizing the following criterion function: 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + α 2 h tr(U T U) + tr(V T V) i , (9) where α is a regularization parameter for complexity control. Furthermore, TagiCoFi employs the user similarities de- fined based on the tagging information to regularize the MF procedure, with the goal to make the user-specific latent feature vectors as similar as possible if the corresponding users have similar tagging history. We can achieve this goal by minimizing the following criterion function: f1 = 1 2 XN i=1 XN j=1 SijkU∗i − U∗jk 2 = 1 2 XN i=1 XN j=1 h Sij XD d=1 (Udi − Udj ) 2 i = XD d=1 Ud∗LU T d∗ = tr(ULU T ), (10) where Sij is the tag-based similarity between user i and user j computed based on one of the measures defined in Section 3.1, L = D−S is known as the Laplacian matrix [5] with D being a diagonal matrix whose diagonal elements Dii = P j Sij , and tr(·) denotes the trace of a matrix. To integrate tagging information into the CF procedure, TagiCoFi combines the criteria (9) and (10) to give the following objective function for minimization: f = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + α 2 h tr(U T U) + tr(V T V) i + β 2 h tr(ULU T ) i , (11) where β is an additional regularization parameter to control the contribution from the tagging information. The formulation in (11) can be seen as an adaptation of relation regularized matrix factorization (RRMF) [13] which models relational data containing both relation information and content information. The main difference between TagiCoFi and RRMF is that TagiCoFi can handle missing data, which is one of the key characteristics of CF. 3.3 Learning The objective function in (11) can be rewritten as follows: f = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 + 1 2 trh U(αI + βL)U T i + α 2 h tr(V T V) i , (12) where I is the identity matrix. We use an alternating gradient descent procedure to optimize (12). More specifically, each time we fix one variable (U or V) and minimize the objective function with respect to the other one (V or U). This procedure is repeated for several iterations until some termination condition is satisfied. To learn U, we first rewrite (12) as follows: f = g + h + C, (13) where C is a constant independent of U, and g = 1 2 XN i=1 XM j=1 Yij (Rij − U T ∗iV∗j ) 2 h = 1 2 XD d=1 Ud∗(αI + βL)U T d∗. (14) From (14), we can see that the rows of U in h are decoupled. Hence, we apply gradient descent to optimize one row of U at a time with the other rows fixed. Because ∂g ∂Udi = “XM j=1 YijV 2 dj” Udi− XM j=1 YijVdj (Rij − U T ∗iV∗j + UdiVdj ), (15)
we have 1.How does TagiCoFi perform in real applications when ag=WUN.-x. compared with state-of-the-art methods? (16) aUa. 2.How effective are the different user similarity mea- where W is an Nx N diagonal matrix with W=YV, sures? and x is an N x 1 vector with i=YijVa(Rij- 3.How does tagging information improve collaborative UnV.;+UaiVa). filtering? Then,we can get 4.How does the number of latent features used affect the ag+00: performance of TagiCoFi? aUa∂Uu* =(W+aI+BC)U2.-x. 5.Does TagiCoFi work for users without any training (17) ratings? The learning process of V is different from that of U These questions are answered separately:question 1 in because the columns (not rows)of V are decoupled.Hence Section 4.3,questions 2-4 in Section 4.4 as three different we apply gradient descent to optimize one column of V at subsubsections,and question 5 in Section 4.5. a time with the other columns fixed.The gradient can be computed as follows: 4.1 Data Set af We evaluate our algorithm on the MovieLens datasets V对 =(al+YUUa)V-】 Yig RijU.i.(18) which,as far as we know,is the only publicly available dataset containing both tagging and rating information. The overall learning procedure of TagiCoFi is summarized We first prune the dataset for our analysis.For the tag- in Algorithm 1 below. ging information,we only keep those tags which are added on at least three distinct movies.As for the users,we only keep those users who used at least 3 distinct tags in their Algorithm 1 Learning procedure of TagiCoFi tagging history.For movies,we only keep those movies that 1:INPUT: are annotated by at least 3 distinct tags.It should be em- R-rating matrix phasized that our model still works under situations where Z-tagging matrix there are users or movies with rating information only but D-number of latent features no tagging information.For those users without any tag- W-number of iterations ging information,the tag-based similarities between them 6-step size for gradient descent and the other users are 0,which means that the last term 2:Compute user similarity matrix S based on Z in (11)will have no effect on those users.Subsequently,the 3:Compute Laplacian matrix C based on S recommendation result for those users without tagging in- 4:Initialize U,Vo formation only depends on the MF procedure of the rating 5:for w=1 to W do matrix.which is similar to the result of PMF.As the focus 6: for d=1 to D do of this paper is on evaluating the effectiveness of tagging 7: U2-U -6. information in addition to rating information,we only keep g end for the users who have both rating history and tagging history 9: for j=1 to M do in the original rating records. 10: V号-V写-i We obtain two kinds of records after pruning,the tag- 11: end for ging records and the rating records.The tagging records in- 12:end for clude 13,431 tagging applications contributed by 757 users 13:return UW,Vw with 2,271 distinct tags.Based on the tagging records,we construct the tagging matrix Z.whose elements are defined by Equation (1)in Section 2.1.The rating records include 3.4 Complexity Analysis 167,474 ratings rated by 757 users (the same as those in the tagging records)on 9,485 movies,and based on these rating The main computation of TagiCoFi is to evaluate the gra records we construct the rating matrix R.More statistics dients of the objective function with respect to the latent about the rating matrix R are shown in Table 1,where the variables and to compute the user similarities.The time numbers behind+denote the standard deviations. complexity of computing the gradient is(ND))and that ofis(NMD).The time complexity of comput Table 1:Description of rating data ing the user similarities and C is O(N2K).Hence,the time Statistics Users Movies complexity of the entire alternating gradient descent proce- Min.of ratings 20 dure is O(W(N2D+NMD)+N2K). Max.of ratings 2.634 625 Mean#of ratings441.95士420.8835.27士67.30 4.EXPERIMENTAL EVALUATION We have conducted several experiments to compare the performance of our method with that of other methods 5http://www.grouplens.org/node/73 Through the experiments,we have tried to answer the fol- 6If user i adds tag k on item j,we say this is a tagging lowing questions: application
we have ∂g ∂Ud∗ = WU T d∗ − x, (16) where W is an N×N diagonal matrix with Wii = PM j=1 YijV 2 dj , and x is an N × 1 vector with xi = PM j=1 YijVdj (Rij − UT ∗iV∗j + UdiVdj ). Then, we can get ∂f ∂Ud∗ = ∂g ∂Ud∗ + ∂h ∂Ud∗ = (W + αI + βL)U T d∗ − x. (17) The learning process of V is different from that of U, because the columns (not rows) of V are decoupled. Hence, we apply gradient descent to optimize one column of V at a time with the other columns fixed. The gradient can be computed as follows: ∂f ∂V∗j = “ αI + XN i=1 YijU∗iU T ∗i ” V∗j − XN i=1 YijRijU∗i. (18) The overall learning procedure of TagiCoFi is summarized in Algorithm 1 below. Algorithm 1 Learning procedure of TagiCoFi 1: INPUT: R – rating matrix Z – tagging matrix D – number of latent features W – number of iterations δ – step size for gradient descent 2: Compute user similarity matrix S based on Z 3: Compute Laplacian matrix L based on S 4: Initialize U0 , V0 5: for w = 1 to W do 6: for d = 1 to D do 7: Uw d∗ ← Uw−1 d∗ − δ ∂f ∂Ud∗ 8: end for 9: for j = 1 to M do 10: Vw ∗j ← Vw−1 ∗j − δ ∂f ∂V∗j 11: end for 12: end for 13: return UW , VW 3.4 Complexity Analysis The main computation of TagiCoFi is to evaluate the gradients of the objective function with respect to the latent variables and to compute the user similarities. The time complexity of computing the gradient ∂f ∂Ud∗ is O(N 2D)) and that of ∂f ∂V∗j is O(NMD). The time complexity of computing the user similarities and L is O(N 2K). Hence, the time complexity of the entire alternating gradient descent procedure is O(W(N 2D + NMD) + N 2K). 4. EXPERIMENTAL EVALUATION We have conducted several experiments to compare the performance of our method with that of other methods. Through the experiments, we have tried to answer the following questions: 1. How does TagiCoFi perform in real applications when compared with state-of-the-art methods? 2. How effective are the different user similarity measures? 3. How does tagging information improve collaborative filtering? 4. How does the number of latent features used affect the performance of TagiCoFi? 5. Does TagiCoFi work for users without any training ratings? These questions are answered separately: question 1 in Section 4.3, questions 2–4 in Section 4.4 as three different subsubsections, and question 5 in Section 4.5. 4.1 Data Set We evaluate our algorithm on the MovieLens dataset5 , which, as far as we know, is the only publicly available dataset containing both tagging and rating information. We first prune the dataset for our analysis. For the tagging information, we only keep those tags which are added on at least three distinct movies. As for the users, we only keep those users who used at least 3 distinct tags in their tagging history. For movies, we only keep those movies that are annotated by at least 3 distinct tags. It should be emphasized that our model still works under situations where there are users or movies with rating information only but no tagging information. For those users without any tagging information, the tag-based similarities between them and the other users are 0, which means that the last term in (11) will have no effect on those users. Subsequently, the recommendation result for those users without tagging information only depends on the MF procedure of the rating matrix, which is similar to the result of PMF. As the focus of this paper is on evaluating the effectiveness of tagging information in addition to rating information, we only keep the users who have both rating history and tagging history in the original rating records. We obtain two kinds of records after pruning, the tagging records and the rating records. The tagging records include 13,431 tagging applications6 contributed by 757 users with 2,271 distinct tags. Based on the tagging records, we construct the tagging matrix Z, whose elements are defined by Equation (1) in Section 2.1. The rating records include 167,474 ratings rated by 757 users (the same as those in the tagging records) on 9,485 movies, and based on these rating records we construct the rating matrix R. More statistics about the rating matrix R are shown in Table 1, where the numbers behind ± denote the standard deviations. Table 1: Description of rating data Statistics Users Movies Min. # of ratings 20 1 Max. # of ratings 2,634 625 Mean # of ratings 441.95 ± 420.88 35.27 ± 67.30 5 http://www.grouplens.org/node/73 6 If user i adds tag k on item j, we say this is a tagging application
4.2 Evaluation Metric Table 3:p-values for the significance tests For consistency with experiments reported in the litera- )5 D=10 D=20 ture,we use the Mean Absolute Error (MAE)as evaluation 20%Training3.91×10-5 8.27×10-7 1.04×10-16 metric.MAE gives the average absolute deviation of predic- 40%Training 4.11×10-8 2.10×10-16 4.52×10-16 tion from the ground truth: 60%Training 1.35 x 10-11 4.20×10-12 4.15×10-12 MAE=∑:∑YlR-l 80%Training1.24×10-8 2.85×1025.99×10-12 ∑,写 where Ri;and Ra are the true and predicted rating values, In order to compare TagiCoFi with PMF more thoroughly, respectively.A smaller value of MAE indicates a better we compare their performance on users with different num- performance bers of observed ratings.The results are shown in Figure 1, In our experiments,we randomly split the rating records from which we can find that TagiCoFi outperforms PMF for into two parts,each of which contains 50%of the observa- all users and the improvement is more significant for users tions in the rating matrix.One part is used as the test set with only few observed ratings.This is a very promising which is kept the same for all experiments.The other part property of TagiCoFi because those users with a small num- is used as a pool from which training sets are generated.For ber of ratings are typically new customers who have just example,a training set size of 20%means that 20%of the started to use the system.If we can provide good recom- records are randomly selected from the pool to form a train- mendation to them,we will have a higher chance to keep ing set.For each training set size,we randomly generate them as our long-term customers.Otherwise we will likely 10 different training sets based on which 10 experiments are lose them performed and the average result is reported. 4.3 Performance --20是Training In this section,we compare our method with PMF which 80号T工a1n1ng has been demonstrated to be one of the state-of-the-art CF methods 17.For fairness,we perform parameter tuning in advance for each method and then use the best settings found in all the experiments.For both methods,we initial- ize the latent features to random numbers in 0,1 and set the step size for gradient descent to 0.001.The parame- ters specific to our method are set as a 1 and B=50. Actually,we find that the performance will be stable af- ter about 1000 rounds of gradient decent (see Figure 3). Hence,we set W=1000 for all the following results.Fur- thermore,we adopt the Pearson similarity for all the experi- ments.The performance of other measures will be discussed -10 in Section 4.4.1. 11-20 161-320 3320 The results reported in Table 2 are the average MAE val- ues of PMF and TagiCoFi and their corresponding standard Figure 1:Performance improvement of TagiCoFi deviations.The better results are shown in bold.It iss clear over that of PMF on different user rating scales(no that TagiCoFi achieves better performance than PMF. users in a 20%training set have more than 320 ob- To evaluate how significant TagiCoFi outperforms PMF served ratings) we have conducted paired t-tests [26 on the results of PMF and TagiCoFi.Given two approaches,say A and B,and a set of n experiments,the MAE values are obtained for 4.4 Sensitivity to Parameters both approaches,denoted by ai and bi for i=1,2,...,n Let di ai-bi denote the difference of ai and bi and d 4.4.I User Similarity Measures be the average of the di values for i=1,2....,n.The null In this section,we conduct a set of experiments to compare hypothesis is d =0 whereas the alternative hypothesis is the effectiveness of the aforementioned user similarity mea- d>0.The p-value is computed using the t-statistic: sures:cosine similarity,Pearson similarity and Euclidean- d based similarity.Due to the page limit restriction,we only T= s/√元1 report results with parameters a =1,B=50,D=10 in Fig- ure 2.We have also observed the same trend in other param- where s is the standard deviation of d.A small p-value eter settings.From Figure 2,we see that the Pearson simi- (0.01)indicates the existence of statistically significant larity always gives the best performance and the Euclidean- evidence against the null hypothesis. based similarity is always the worst.Although the difference Table 3 shows the p-values obtained in our experiments. between these measures is obvious,Figure 2 shows that the It is easily observed that TagiCoFi significantly outperforms difference decreases as the training set size increases.One PMF.Because the main difference between TagiCoFi and may ask if changing the o parameter in the Euclidean-based PMF lies in the extra tagging information used by TagiCoFi, similarity measure will help.We have tuned the parameter we can conclude that the tagging information is very useful by trying different values but cannot make it outperform the and TagiCoFi can utilize it very effectively. other similarity measures.Based on this analysis,we adopt
4.2 Evaluation Metric For consistency with experiments reported in the literature, we use the Mean Absolute Error (MAE) as evaluation metric. MAE gives the average absolute deviation of prediction from the ground truth: MAE = P i P j Yij |Rij − Rˆij | P i P j Yij , where Rij and Rˆij are the true and predicted rating values, respectively. A smaller value of MAE indicates a better performance. In our experiments, we randomly split the rating records into two parts, each of which contains 50% of the observations in the rating matrix. One part is used as the test set, which is kept the same for all experiments. The other part is used as a pool from which training sets are generated. For example, a training set size of 20% means that 20% of the records are randomly selected from the pool to form a training set. For each training set size, we randomly generate 10 different training sets based on which 10 experiments are performed and the average result is reported. 4.3 Performance In this section, we compare our method with PMF which has been demonstrated to be one of the state-of-the-art CF methods [17]. For fairness, we perform parameter tuning in advance for each method and then use the best settings found in all the experiments. For both methods, we initialize the latent features to random numbers in [0, 1] and set the step size for gradient descent to 0.001. The parameters specific to our method are set as α = 1 and β = 50. Actually, we find that the performance will be stable after about 1000 rounds of gradient decent (see Figure 3). Hence, we set W = 1000 for all the following results. Furthermore, we adopt the Pearson similarity for all the experiments. The performance of other measures will be discussed in Section 4.4.1. The results reported in Table 2 are the average MAE values of PMF and TagiCoFi and their corresponding standard deviations. The better results are shown in bold. It iss clear that TagiCoFi achieves better performance than PMF. To evaluate how significant TagiCoFi outperforms PMF, we have conducted paired t-tests [26] on the results of PMF and TagiCoFi. Given two approaches, say A and B, and a set of n experiments, the MAE values are obtained for both approaches, denoted by ai and bi for i = 1, 2, . . . , n. Let di = ai − bi denote the difference of ai and bi and d¯ be the average of the di values for i = 1, 2, . . . , n. The null hypothesis is d¯ = 0 whereas the alternative hypothesis is d >¯ 0. The p-value is computed using the t-statistic: T = d¯ s/√ n , where s is the standard deviation of d. A small p-value (≤ 0.01) indicates the existence of statistically significant evidence against the null hypothesis. Table 3 shows the p-values obtained in our experiments. It is easily observed that TagiCoFi significantly outperforms PMF. Because the main difference between TagiCoFi and PMF lies in the extra tagging information used by TagiCoFi, we can conclude that the tagging information is very useful and TagiCoFi can utilize it very effectively. Table 3: p-values for the significance tests D = 5 D = 10 D = 20 20% Training 3.91 × 10−15 8.27 × 10−17 1.04 × 10−16 40% Training 4.11 × 10−13 2.10 × 10−16 4.52 × 10−16 60% Training 1.35 × 10−11 4.20 × 10−12 4.15 × 10−12 80% Training 1.24 × 10−8 2.85 × 10−12 5.99 × 10−12 In order to compare TagiCoFi with PMF more thoroughly, we compare their performance on users with different numbers of observed ratings. The results are shown in Figure 1, from which we can find that TagiCoFi outperforms PMF for all users and the improvement is more significant for users with only few observed ratings. This is a very promising property of TagiCoFi because those users with a small number of ratings are typically new customers who have just started to use the system. If we can provide good recommendation to them, we will have a higher chance to keep them as our long-term customers. Otherwise we will likely lose them. 1−10 11−20 21−40 41−80 81−160 161−320 >320 0 0.05 0.1 Number of Observed Ratings MAE Improvement 20% Training 80% Training Figure 1: Performance improvement of TagiCoFi over that of PMF on different user rating scales (no users in a 20% training set have more than 320 observed ratings) 4.4 Sensitivity to Parameters 4.4.1 User Similarity Measures In this section, we conduct a set of experiments to compare the effectiveness of the aforementioned user similarity measures: cosine similarity, Pearson similarity and Euclideanbased similarity. Due to the page limit restriction, we only report results with parameters α = 1, β = 50, D = 10 in Figure 2. We have also observed the same trend in other parameter settings. From Figure 2, we see that the Pearson similarity always gives the best performance and the Euclideanbased similarity is always the worst. Although the difference between these measures is obvious, Figure 2 shows that the difference decreases as the training set size increases. One may ask if changing the σ parameter in the Euclidean-based similarity measure will help. We have tuned the parameter by trying different values but cannot make it outperform the other similarity measures. Based on this analysis, we adopt
Table 2:MAE comparison between PMF and TagiCoFi(x10-2) Training User Movie D=5 D=10 D=20 Set Size Average Average PMF TagiCoFi PMF TagiCoFi PMF TagiCoFi 20% 73.16±0.09 74.18±0.17 73.53±0.1368.86±0.12 73.50±0.11 68.32±0.1373.54±0.12 67.73±0.14 40% 72.76±0.07 71.98±0.09 68.79±0.0866.06±0.12 68.88±0.06 65.82±0.0768.91±0.08 65.46±0.14 60% 72.62±0.04 70.97±0.05 66.46±0.1264.70±0.10 66.71±0.09 64.65±0.1066.85±0.09 64.59±0.14 809% 72.53±0.02 70.39±0.04 64.88±0.1463.53±0.10 65.15±0.10 63.64±0.1065.42±0.07 63.84±0.11 -0.1-1 10 50--100 Pea上son 0.5 Cosine Euclidean 0.8 20 Percentage of Training 501 200 300 400 500 600700 800 00 teration Figure 2:Comparison of similarity measures Figure 3:Impact of B the Pearson similarity as our similarity measure in all other experiments. tions are plotted in Figure 4.We also show the percentage decrease in MAE with respect to that for 10 latent features 4.4.2 Impact of Tagging Information at the points 20.30.40 and 50. As we saw in Section 3,the contribution of tagging in- Figure 4 shows that the MAE decreases as the number of formation is controlled by the parameter B.If B=0,we latent features increases.This agrees with our assumption, do not use tagging information at all and hence our method because the more latent features,the more information can degenerates to a special form of PMF;as B increases,we be represented by the latent feature vectors.The figure put larger weight on the tagging information.To evaluate also shows that the improvement in MAE gets smaller as the impact of tagging information on collaborative filtering D continues to increase.When D becomes large enough, we carry out a set of experiments by varying the value of there is essentially no significant improvement because the B.The MAE curves for different B values on 20%training useful information has already been represented well by the sets are plotted in Figure 3.The other parameters are set existing latent features.From Figure 4,we can see that as a=1 and D=10. TagiCoFi can achieve good performance with D taking a As we can see from Figure 3,adopting a larger B value wide range of values. can help to avoid the overfitting problem suffered by most MF-based CF methods 17.When B 1,the overfitting 4.5 Cold-Start Setting problem is apparent.If we set B>10,we do not experience One well-known problem of CF systems is the cold-start overfitting any more.This phenomenon clearly validates the problem,in which recommendations are required for users impact of tagging information,that is,adding more tagging or items which have no observed ratings [20,8,11].Pure CF information can improve the generalization ability of the methods,such as PMF,cannot work under a cold-start set- model.Moreover,Figure 3 also shows that the performance ting,since no preference information is available to form any might degrade when B is too large.So in practice,we should basis for giving recommendation.Suppose tagging informa- choose a moderate value of 3.Actually,our method is not tion is available,TagiCoFi can solve the cold-start problem sensitive to B within a wide range,such as 10<B<50. by seamlessly integrating the tagging information for recom- mendation. 4.4.3 Impact of Number of Latent Features To validate the above speculation,we conduct two sets of Another important parameter in our method is the num- experiments based on 20%training sets,where we randomly ber of latent features D.In this section,we conduct a set of select 50 and 100 users and discard their ratings.These experiments on 20%training sets to study how D affects the users,called cold-start users,are quite commonly found in performance of our model.We use the following parameters: many recommender systems,such as newly registered users a =1.B=50.The MAE values and their standard devia- in a system.In the experiments,the parameters of our
Table 2: MAE comparison between PMF and TagiCoFi (×10−2 ) Training Set Size User Average Movie Average D = 5 D = 10 D = 20 PMF TagiCoFi PMF TagiCoFi PMF TagiCoFi 20% 73.16±0.09 74.18±0.17 73.53±0.13 68.86±0.12 73.50±0.11 68.32±0.13 73.54±0.12 67.73±0.14 40% 72.76±0.07 71.98±0.09 68.79±0.08 66.06±0.12 68.88±0.06 65.82±0.07 68.91±0.08 65.46±0.14 60% 72.62±0.04 70.97±0.05 66.46±0.12 64.70±0.10 66.71±0.09 64.65±0.10 66.85±0.09 64.59±0.14 80% 72.53±0.02 70.39±0.04 64.88±0.14 63.53±0.10 65.15±0.10 63.64±0.10 65.42±0.07 63.84±0.11 20% 40% 60% 80% 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 Percentage of Training MAE Pearson Cosine Euclidean Figure 2: Comparison of similarity measures the Pearson similarity as our similarity measure in all other experiments. 4.4.2 Impact of Tagging Information As we saw in Section 3, the contribution of tagging information is controlled by the parameter β. If β = 0, we do not use tagging information at all and hence our method degenerates to a special form of PMF; as β increases, we put larger weight on the tagging information. To evaluate the impact of tagging information on collaborative filtering, we carry out a set of experiments by varying the value of β. The MAE curves for different β values on 20% training sets are plotted in Figure 3. The other parameters are set as α = 1 and D = 10. As we can see from Figure 3, adopting a larger β value can help to avoid the overfitting problem suffered by most MF-based CF methods [17]. When β ≤ 1, the overfitting problem is apparent. If we set β ≥ 10, we do not experience overfitting any more. This phenomenon clearly validates the impact of tagging information, that is, adding more tagging information can improve the generalization ability of the model. Moreover, Figure 3 also shows that the performance might degrade when β is too large. So in practice, we should choose a moderate value of β. Actually, our method is not sensitive to β within a wide range, such as 10 ≤ β ≤ 50. 4.4.3 Impact of Number of Latent Features Another important parameter in our method is the number of latent features D. In this section, we conduct a set of experiments on 20% training sets to study how D affects the performance of our model. We use the following parameters: α = 1, β = 50. The MAE values and their standard devia- 0 100 200 300 400 500 600 700 800 900 1000 0.65 0.7 0.75 0.8 0.85 0.9 Iteration MAE 0.1 1 10 50 100 Figure 3: Impact of β tions are plotted in Figure 4. We also show the percentage decrease in MAE with respect to that for 10 latent features at the points 20, 30, 40 and 50. Figure 4 shows that the MAE decreases as the number of latent features increases. This agrees with our assumption, because the more latent features, the more information can be represented by the latent feature vectors. The figure also shows that the improvement in MAE gets smaller as D continues to increase. When D becomes large enough, there is essentially no significant improvement because the useful information has already been represented well by the existing latent features. From Figure 4, we can see that TagiCoFi can achieve good performance with D taking a wide range of values. 4.5 Cold-Start Setting One well-known problem of CF systems is the cold-start problem, in which recommendations are required for users or items which have no observed ratings [20, 8, 11]. Pure CF methods, such as PMF, cannot work under a cold-start setting, since no preference information is available to form any basis for giving recommendation. Suppose tagging information is available, TagiCoFi can solve the cold-start problem by seamlessly integrating the tagging information for recommendation. To validate the above speculation, we conduct two sets of experiments based on 20% training sets, where we randomly select 50 and 100 users and discard their ratings. These users, called cold-start users, are quite commonly found in many recommender systems, such as newly registered users in a system. In the experiments, the parameters of our
0. 0.8 0.82 0. 0.78 0.76 0.74 0.72 0.72 100 0. 200300400 100 200300400 (a)50 cold-start users (b)100 cold-start users Figure 5:PMF and TagiCoFi in cold-start settings Table 4:MAE comparison in cold-start settings Types 50 cold-start users 100 cold-start users PMF TagiCoFi PMF TagiCoFi Cold-start Users0.7683±0.00120.7500±0.00090.7623±0.00220.7425±0.0011 All Users 0.7680±0.00090.7299±0.00160.7677±0.00160.7303±0.0024 It is clear that TagiCoFi performs better than PMF at all 0.69 levels. 0.688 0.686 5.CONCLUSION 0.684 We have proposed a novel framework,TagiCoFi,to seam- lessly incorporate tagging information into collaborative fil- 0.682 tering for item recommendation.To the best of our knowl- 0.68 edge,TagiCoFi is the first work that incorporates tagging information into a model-based CF system for item recom- 0.678 mendation.One promising property of TagiCoFi is that it 1.389% 0.676 can overcome the overfitting problem suffered by most MF- 1.61% based CF methods.Moreover,TagiCoFi can also solve the 0.674 cold-start problem for novel users.Experimental results on 0.672………-… real data demonstrate that TagiCoFi can significantly out- perform state-of-the-art collaborative filtering algorithms, 0.67 0 1 20 30 40 0 such as PMF,which discard the tagging information. Number of Latent Features One of our future research directions is to extend TagiCoFi by incorporating into it the tagging history of items.Fur- Figure 4:Impact of the number of latent features thermore,we plan to extend TagiCoFi to incorporate addi- tional sources of information to further improve the perfor- mance of recommender systems. model are set as a =1.B=50 and D=10.In our imple- mentation of PMF,we use the item average to predict the 6. ACKNOWLEDGMENTS rating of cold-start users for an item,because the original We thank the GroupLens research lab at the University of PMF cannot give prediction for those cold-start users.The Minnesota for their dataset.The research reported in this MAE curves of PMF and TagiCoFi during the first 1000 paper is supported by research grant HIA98/99.EG01 from iterations are plotted in Figure 5. the Hong Kong University of Science and Technology. Figure 5 shows that TagiCoFi significantly outperforms PMF,validating our speculation that tagging information could be used to perform recommendation for cold-start users. 7. REFERENCES Table 4 shows the MAE values of PMF and TagiCoFi to- [1]G.Adomavicius and A.Tuzhilin.Toward the next gether with their corresponding standard deviations on cold- generation of recommender systems:a survey of the start users and all users respectively in two different settings. state-of-the-art and possible extensions.IEEE
0 100 200 300 400 500 600 700 800 900 1000 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 Iteration MAE PMF TagiCoFi (a) 50 cold-start users 0 100 200 300 400 500 600 700 800 900 1000 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 Iteration MAE PMF TagiCoFi (b) 100 cold-start users Figure 5: PMF and TagiCoFi in cold-start settings Table 4: MAE comparison in cold-start settings Types 50 cold-start users 100 cold-start users PMF TagiCoFi PMF TagiCoFi Cold-start Users 0.7683 ± 0.0012 0.7500 ± 0.0009 0.7623 ± 0.0022 0.7425 ± 0.0011 All Users 0.7680 ± 0.0009 0.7299 ± 0.0016 0.7677 ± 0.0016 0.7303 ± 0.0024 0 10 20 30 40 50 0.67 0.672 0.674 0.676 0.678 0.68 0.682 0.684 0.686 0.688 0.69 Number of Latent Features MAE 0.86% 1.38% 1.61% 1.64% Figure 4: Impact of the number of latent features model are set as α = 1, β = 50 and D = 10. In our implementation of PMF, we use the item average to predict the rating of cold-start users for an item, because the original PMF cannot give prediction for those cold-start users. The MAE curves of PMF and TagiCoFi during the first 1000 iterations are plotted in Figure 5. Figure 5 shows that TagiCoFi significantly outperforms PMF, validating our speculation that tagging information could be used to perform recommendation for cold-start users. Table 4 shows the MAE values of PMF and TagiCoFi together with their corresponding standard deviations on coldstart users and all users respectively in two different settings. It is clear that TagiCoFi performs better than PMF at all levels. 5. CONCLUSION We have proposed a novel framework, TagiCoFi, to seamlessly incorporate tagging information into collaborative filtering for item recommendation. To the best of our knowledge, TagiCoFi is the first work that incorporates tagging information into a model-based CF system for item recommendation. One promising property of TagiCoFi is that it can overcome the overfitting problem suffered by most MFbased CF methods. Moreover, TagiCoFi can also solve the cold-start problem for novel users. Experimental results on real data demonstrate that TagiCoFi can significantly outperform state-of-the-art collaborative filtering algorithms, such as PMF, which discard the tagging information. One of our future research directions is to extend TagiCoFi by incorporating into it the tagging history of items. Furthermore, we plan to extend TagiCoFi to incorporate additional sources of information to further improve the performance of recommender systems. 6. ACKNOWLEDGMENTS We thank the GroupLens research lab at the University of Minnesota for their dataset. The research reported in this paper is supported by research grant HIA98/99.EG01 from the Hong Kong University of Science and Technology. 7. REFERENCES [1] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE
Transactions on Knowledge and Data Engineering, collaborative filtering of netnews.In Proceedings of the 17(6):734-749,June2005. 8th ACM Conference on Computer Supported [2]M.Balabanovic and Y.Shoham.Fab:content-based, Cooperative Work,pages 175-186,New York,NY collaborative recommendation.Communications of the USA.1994.ACM. ACM.40(3):66-72,1997. [17]R.Salakhutdinov and A.Mnih.Probabilistic matrix [3 J.Bennett and S.Lanning.The Netflix prize.In factorization.In J.Platt,D.Koller,Y.Singer,and Proceedings of KDD Cup and Workshop,volume 2007, S.Roweis,editors,Advances in Neural Information 2007. Processing Systems 20,pages 1257-1264.MIT Press, [4]J.Breese,D.Heckerman,and C.Kadie.Empirical Cambridge,MA,2008. analysis of predictive algorithms for collaborative [18 G.Salton and C.Buckley.Term-weighting approaches filtering.In Proceedings of the 14th Conference on in automatic text retrieval.Infomation Processing Uncertainty in Artificial Intelligence,pages 43-52, Management:an International Journal 1998. 24(5):513-523,1988. 5 F.Chung.Spectral Graph Theory.Number 92 in 19 B.Sarwar,G.Karypis,J.Konstan,and J.Reidl. Regional Conference Series in Mathematics.American Item-based collaborative filtering recommendation Mathematical Society,1997. algorithms.In Proceedings of the 10th International [6 M.de Gemmis,P.Lops,G.Semeraro,and P.Basile Conference on World Wide Web,pages 285-295,New Integrating tags in a semantic content-based York,NY,USA,2001.ACM. recommender.In Proceedings of the 2nd ACM [20A.I.Schein,A.Popescul,L.H.Ungar,and D.M. Conference on Recommender Systems,pages 163-170 Pennock.Methods and metrics for cold-start New York.NY,USA,2008.ACM. recommendations.In Proceedings of the 25th Annual [7]N.Garg and I.Weber.Personalized,interactive tag International ACM SIGIR Conference on Research recommendation for flickr.In Proceedings of the 2nd and Development in Information Retrieval,pages ACM Conference on Recommender Systems,pages 253-260.New York.NY.USA.2002.ACM 67-74,New York,NY,USA,2008.ACM. [21]S.Sen,S.K.Lam,A.M.Rashid,D.Cosley, 8 H.Guo.Soap:live recommendations through social D.Frankowski,J.Osterhouse,F.M.Harper,and agents.In Proceedings of the 5th DELOS Workshop on J.Riedl.Tagging,communities,vocabulary,evolution. Filtering and Collaborative Filtering,1997. In Proceedings of the 20th ACM Conference on [9 J.L.Herlocker,J.A.Konstan,A.Borchers,and Computer Supported Cooperative Work,pages J.Riedl.An algorithmic framework for performing 181-190,New York,NY,USA.2006.ACM. collaborative filtering.In Proceedings of the 22nd [22 S.Sen,J.Vig,and J.Riedl.Tagommenders: Annual International ACM SIGIR Conference on connecting users to items through tags.In Proceedings Research and Development in Information Retrieval of the 18th International Conference on World Wide pages 230-237,New York,NY,USA,1999.ACM. Web,pages 671-680,New York,NY,USA,2009. 10 Y.Koren.Factorization meets the neighborhood:a ACM. multifaceted collaborative filtering model.In 23A.Shepitsen,J.Gemmell,B.Mobasher,and R.Burke. Proceeding of the 14th ACM SIGKDD International Personalized recommendation in social tagging Conference on Knowledge Discovery and Data Mining, systems using hierarchical clustering.In Proceedings of pages 426-434,New York,NY.USA,2008.ACM. the 2nd ACM Conference on Recommender Systems, 11]X.N.Lam,T.Vu,T.D.Le,and A.D.Duong. pages 259-266,New York,NY,USA,2008.ACM. Addressing cold-start problem in recommendation [24 P.Symeonidis,A.Nanopoulos,and Y.Manolopoulos. systems.In Proceedings of the 2nd International Tag recommendations based on tensor dimensionality Conference on Ubiquitous Information Management reduction.In Proceedings of the 2nd ACM Conference and Communication,pages 208-211,New York,NY on Recommender Systems,pages 43-50,New York, USA.2008.ACM. NY.USA.2008.ACM. 12 K.Lang.Newsweeder:learning to filter netnews.In 25 K.H.L.Tso-Sutter,L.B.Marinho,and Proceedings of the 12th International Conference on L.Schmidt-Thieme.Tag-aware recommender systems Machine Learning,pages 331-339.Morgan Kaufmann, by fusion of collaborative filtering algorithms.In 1995. Proceedings of the 2nd ACM Symposium on Applied 13 W.-J.Li and D.-Y.Yeung.Relation regularized matrix Computing,pages 1995-1999,New York.NY.USA. factorization.In Proceedings of the 21st International 2008.ACM. Joint Conference on Artificial Intelligence,2009. [26 Y.Yang and X.Liu.A re-examination of text [14]G.Linden,B.Smith,and J.York.Amazon.com categorization methods.In Proceedings of the 22nd recommendations:item-to-item collaborative filtering Annual International ACM SIGIR Conference on IEEE Internet Computing,7(1):76-80,Jan/Feb 2003. Research and Development in Information Retrieval 15 R.J.Mooney and L.Roy.Content-based book pages 42-49,New York,NY,USA,1999.ACM. recommending using learning for text categorization 27 V.Zanardi and L.Capra.Social ranking:uncovering In Proceedings of the 5th ACM Conference on Digital relevant content using tag-based recommender Libraries,pages 195-204,New York,NY,USA,2000. systems.In Proceedings of the 2nd ACM Conference ACM. on Recommender Systems,pages 51-58,New York, 16 P.Resnick,N.Iacovou,M.Suchak,P.Bergstrom,and NY.USA,2008.ACM. J.Riedl.Grouplens:an open architecture for
Transactions on Knowledge and Data Engineering, 17(6):734–749, June 2005. [2] M. Balabanovi´c and Y. Shoham. Fab: content-based, collaborative recommendation. Communications of the ACM, 40(3):66–72, 1997. [3] J. Bennett and S. Lanning. The Netflix prize. In Proceedings of KDD Cup and Workshop, volume 2007, 2007. [4] J. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, pages 43–52, 1998. [5] F. Chung. Spectral Graph Theory. Number 92 in Regional Conference Series in Mathematics. American Mathematical Society, 1997. [6] M. de Gemmis, P. Lops, G. Semeraro, and P. Basile. Integrating tags in a semantic content-based recommender. In Proceedings of the 2nd ACM Conference on Recommender Systems, pages 163–170, New York, NY, USA, 2008. ACM. [7] N. Garg and I. Weber. Personalized, interactive tag recommendation for flickr. In Proceedings of the 2nd ACM Conference on Recommender Systems, pages 67–74, New York, NY, USA, 2008. ACM. [8] H. Guo. Soap: live recommendations through social agents. In Proceedings of the 5th DELOS Workshop on Filtering and Collaborative Filtering, 1997. [9] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 230–237, New York, NY, USA, 1999. ACM. [10] Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 426–434, New York, NY, USA, 2008. ACM. [11] X. N. Lam, T. Vu, T. D. Le, and A. D. Duong. Addressing cold-start problem in recommendation systems. In Proceedings of the 2nd International Conference on Ubiquitous Information Management and Communication, pages 208–211, New York, NY, USA, 2008. ACM. [12] K. Lang. Newsweeder: learning to filter netnews. In Proceedings of the 12th International Conference on Machine Learning, pages 331–339. Morgan Kaufmann, 1995. [13] W.-J. Li and D.-Y. Yeung. Relation regularized matrix factorization. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, 2009. [14] G. Linden, B. Smith, and J. York. Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing, 7(1):76–80, Jan/Feb 2003. [15] R. J. Mooney and L. Roy. Content-based book recommending using learning for text categorization. In Proceedings of the 5th ACM Conference on Digital Libraries, pages 195–204, New York, NY, USA, 2000. ACM. [16] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl. Grouplens: an open architecture for collaborative filtering of netnews. In Proceedings of the 8th ACM Conference on Computer Supported Cooperative Work, pages 175–186, New York, NY, USA, 1994. ACM. [17] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1257–1264. MIT Press, Cambridge, MA, 2008. [18] G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Infomation Processing Management: an International Journal, 24(5):513–523, 1988. [19] B. Sarwar, G. Karypis, J. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, pages 285–295, New York, NY, USA, 2001. ACM. [20] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock. Methods and metrics for cold-start recommendations. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 253–260, New York, NY, USA, 2002. ACM. [21] S. Sen, S. K. Lam, A. M. Rashid, D. Cosley, D. Frankowski, J. Osterhouse, F. M. Harper, and J. Riedl. Tagging, communities, vocabulary, evolution. In Proceedings of the 20th ACM Conference on Computer Supported Cooperative Work, pages 181–190, New York, NY, USA, 2006. ACM. [22] S. Sen, J. Vig, and J. Riedl. Tagommenders: connecting users to items through tags. In Proceedings of the 18th International Conference on World Wide Web, pages 671–680, New York, NY, USA, 2009. ACM. [23] A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke. Personalized recommendation in social tagging systems using hierarchical clustering. In Proceedings of the 2nd ACM Conference on Recommender Systems, pages 259–266, New York, NY, USA, 2008. ACM. [24] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the 2nd ACM Conference on Recommender Systems, pages 43–50, New York, NY, USA, 2008. ACM. [25] K. H. L. Tso-Sutter, L. B. Marinho, and L. Schmidt-Thieme. Tag-aware recommender systems by fusion of collaborative filtering algorithms. In Proceedings of the 2nd ACM Symposium on Applied Computing, pages 1995–1999, New York, NY, USA, 2008. ACM. [26] Y. Yang and X. Liu. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 42–49, New York, NY, USA, 1999. ACM. [27] V. Zanardi and L. Capra. Social ranking: uncovering relevant content using tag-based recommender systems. In Proceedings of the 2nd ACM Conference on Recommender Systems, pages 51–58, New York, NY, USA, 2008. ACM