Collaborative Topic Regression with Social Regularization for Tag Recommendation Hao Wang,Binyi Chen,Wu-Jun Li Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering,Shanghai Jiao Tong University,China js05212,Charles_Chen@sjtu.edu.cn,liwujun@cs.sjtu.edu.cn Abstract users might use different words to express the same mean- ing,which makes it more difficult to utilize the tagging in- Recently,tag recommendation (TR)has become a formation.Tag recommendation can help to limit vocabulary very hot research topic in data mining and related of tags and thus alleviate the above problems.Furthermore, areas.However,neither co-occurrence based meth- it can also help to prevent misspelt or meaningless words. ods which only use the item-tag matrix nor content Therefore.TR [Wang et al.,2012]has become a very hot re- based methods which only use the item content in- formation can achieve satisfactory performance in search topic in recent years,and many methods have been proposed by researchers. real TR applications.Hence,how to effectively combine the item-tag matrix,item content informa- Existing tag recommendation methods can be roughly cat- egorized into three classes [Wang et al.,2012]:content-based tion,and other auxiliary information into the same recommendation framework is the key challenge methods,co-occurrence based methods,and hybrid method- for TR.In this paper,we first adapt the collabora- s.Content-based methods [Chen et al.,2008;Lipczak et al., tive topic regression (CTR)model,which has been 2009;Shen and Fan,2010;Lee et al.,2010;Toderici et al., 2010:Chen et al.,2010.directly adopt the content of re- successfully applied for article recommendation,to combine both item-tag matrix and item content in- sources/items,such as abstract of articles,image content and formation for TR.Furthermore,by extending CTR description of images,to perform tag recommendation.Co- we propose a novel hierarchical Bayesian model. occurrence based methods [Benz et al.,2006;Xu et al.,2006; called CTR with social regularization(CTR-SR), Hotho et al.,2006:Marinho and Schmidt-Thieme,2007: to seamlessly integrate the item-tag matrix,item Sigurbjornsson and van Zwol,2008;Garg and Weber,2008; content information,and social networks between Weinberger et al.,2008:Wu et al.,2009;Rendle and items into the same principled model.Experiments Schmidt-Thieme,2010]mainly use the co-occurrence of tags on real data demonstrate the effectiveness of our among items (i.e.,the item-tag matrix)for tagging.Actu- proposed models. ally,the underlying principle of co-occurrence based meth- ods is similar to that of collaborative filtering (CF)meth- ods [Adomavicius and Tuzhilin,2005;Zhen et al.,2009; 1 Introduction Li and Yeung,2011].Because the TR problem is very Tagging systems have been playing very important role for complex and difficult,neither co-occurrence based method- us to better categorize and organize information.For exam- s nor content based methods can achieve satisfactory per- ple,Flickr uses tags to label and organize photos.Last.fm formance in real TR applications.Hence,the recent trend adopts tags to categorize artists and music,and CiteULike3 in TR research is to use hybrid methods [Wu et al.,2009; allows users to tag articles.With the tagging systems,users Sevil et al.,2010:Lops et al.,2011:2013]which try to com- are able to better organize their own content and find relevant bine both item-tag matrix and item content information to- gether for recommendation. resources (content)more easily. However,finding the set of proper words(tags)to describe However,it is still a challenge to find an effective way to the resources often requires high mental focus.That is why combine both item-tag matrix and item content information tag recommendation (TR)Gupta et al.,2010;Wang et al., for TR.Furthermore,in some applications there may exist so- 2012]has become more and more important on the Internet. cial networks (relations)between items.For example,if we With the tag recommendation system,users only need a few want to tag articles in CiteULike,there are citation relation- clicks to finish the tagging process.Moreover,tags created by s or other social networks between articles [Li et al..2011: various users can be inconsistent and idiosyncratic.Different Wang and Li,2013].Typically,two articles with relation be- tween them might be most likely to be about the same top- http://www.flickr.com ic [Li et al.,2009a;2009b],and consequently they should http://www.lastfm.com have similar tags.Hence,how to effectively integrate social 3http://www.citeulike.org networks between items for tagging is another challenge
Collaborative Topic Regression with Social Regularization for Tag Recommendation Hao Wang, Binyi Chen, Wu-Jun Li Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong University, China {js05212, Charles Chen}@sjtu.edu.cn, liwujun@cs.sjtu.edu.cn Abstract Recently, tag recommendation (TR) has become a very hot research topic in data mining and related areas. However, neither co-occurrence based methods which only use the item-tag matrix nor content based methods which only use the item content information can achieve satisfactory performance in real TR applications. Hence, how to effectively combine the item-tag matrix, item content information, and other auxiliary information into the same recommendation framework is the key challenge for TR. In this paper, we first adapt the collaborative topic regression (CTR) model, which has been successfully applied for article recommendation, to combine both item-tag matrix and item content information for TR. Furthermore, by extending CTR we propose a novel hierarchical Bayesian model, called CTR with social regularization (CTR-SR), to seamlessly integrate the item-tag matrix, item content information, and social networks between items into the same principled model. Experiments on real data demonstrate the effectiveness of our proposed models. 1 Introduction Tagging systems have been playing very important role for us to better categorize and organize information. For example, Flickr1 uses tags to label and organize photos, Last.fm2 adopts tags to categorize artists and music, and CiteULike3 allows users to tag articles. With the tagging systems, users are able to better organize their own content and find relevant resources (content) more easily. However, finding the set of proper words (tags) to describe the resources often requires high mental focus. That is why tag recommendation (TR) [Gupta et al., 2010; Wang et al., 2012] has become more and more important on the Internet. With the tag recommendation system, users only need a few clicks to finish the tagging process. Moreover, tags created by various users can be inconsistent and idiosyncratic. Different 1http://www.flickr.com 2http://www.lastfm.com 3http://www.citeulike.org users might use different words to express the same meaning, which makes it more difficult to utilize the tagging information. Tag recommendation can help to limit vocabulary of tags and thus alleviate the above problems. Furthermore, it can also help to prevent misspelt or meaningless words. Therefore, TR [Wang et al., 2012] has become a very hot research topic in recent years, and many methods have been proposed by researchers. Existing tag recommendation methods can be roughly categorized into three classes [Wang et al., 2012]: content-based methods, co-occurrence based methods, and hybrid methods. Content-based methods [Chen et al., 2008; Lipczak et al., 2009; Shen and Fan, 2010; Lee et al., 2010; Toderici et al., 2010; Chen et al., 2010], directly adopt the content of resources/items, such as abstract of articles, image content and description of images, to perform tag recommendation. Cooccurrence based methods [Benz et al., 2006; Xu et al., 2006; Hotho et al., 2006; Marinho and Schmidt-Thieme, 2007; Sigurbjornsson and van Zwol, 2008; Garg and Weber, 2008; ¨ Weinberger et al., 2008; Wu et al., 2009; Rendle and Schmidt-Thieme, 2010] mainly use the co-occurrence of tags among items (i.e., the item-tag matrix) for tagging. Actually, the underlying principle of co-occurrence based methods is similar to that of collaborative filtering (CF) methods [Adomavicius and Tuzhilin, 2005; Zhen et al., 2009; Li and Yeung, 2011]. Because the TR problem is very complex and difficult, neither co-occurrence based methods nor content based methods can achieve satisfactory performance in real TR applications. Hence, the recent trend in TR research is to use hybrid methods [Wu et al., 2009; Sevil et al., 2010; Lops et al., 2011; 2013] which try to combine both item-tag matrix and item content information together for recommendation. However, it is still a challenge to find an effective way to combine both item-tag matrix and item content information for TR. Furthermore, in some applications there may exist social networks (relations) between items. For example, if we want to tag articles in CiteULike, there are citation relations or other social networks between articles [Li et al., 2011; Wang and Li, 2013]. Typically, two articles with relation between them might be most likely to be about the same topic [Li et al., 2009a; 2009b], and consequently they should have similar tags. Hence, how to effectively integrate social networks between items for tagging is another challenge
In this paper,we propose some novel methods to solve the 3 Collaborative Topic Regression above challenges.The main contributions of this paper can be outlined as follows: Collaborative topic regression(CTR)[Wang and Blei,2011] combines CF and latent Dirichlet allocation (LDA)[Blei et We adapt the collaborative topic regression(CTR)mod- al.,2003]to perform recommendation.CTR is initially pro- el [Wang and Blei,2011],which has been successful- posed to recommend articles (papers)to users by utilizing ly applied for article recommendation,to combine both both user-article rating information and article content infor- item-tag matrix and item content information for tag rec- mation.In this paper,we adapt CTR to our tag recommen- ommendation in a principled way. dation problem to seamlessly integrate both item-tag matrix information and item content information. By extending CTR,we propose a novel hierarchical Bayesian model,called CTR with social regulariza For ease of presentation,we use similar graphical models tion (CTR-SR),to seamlessly integrate the item-tag ma- and notations as those in CTR [Wang and Blei,2011]for our trix,item content information,and social networks be- problem formulation.The graphical model of CTR is illus- tween items into the same principled model. trated in Figure 1.Assume there are K topics B=B1:K.The generative process of CTR for tag recommendation is listed Extensive experiments on real-world data sets show that as follows: CTR can outperform the baselines which use only one kind of information,either item-tag matrix or item con- 1.Draw tag latent vector for each tag i: tent information.Furthermore,CTR-SR can effectively u~W(0,入1IK): utilize the social networks between items to further im- prove the performance. where N()denotes the normal distribution,Ik is an identity matrix with K rows and columns. 2 Problem Statement 2.For each item j, Assume we have a set of items W=[w1,w2,...,w to (a)Draw topic proportions;Dirichlet(a). be tagged,where wi Rd denotes the content(attributes) (b)Draw item latent offset j N(0,AIk)and of item j.For example,if we want to tag articles (papers)in then set the item latent vector to be:vj=ej+j. CiteULike,the items are papers,and the content information can be the bag-of-word representation of paper abstract.As- (c)For each word win of item (paper)wi, sume there are I tags {t1,t2,..,tr which are candidates i.Draw topic assignment zin~Mult(j). to be recommended to tag each item.Then we can use a tag- ii.Draw word wjn ~Mult(B). item matrixR=[riilx to represent the tagging informa- 3.Draw the tagging information rij for each tag-item pair tion for all the items.rij is a binary variable,where rj=1 (i,) means that the tag ti is associated with item wi.Otherwise, rij=0 means that tag ti is not associated with item w;.The r%~W(u巧,c写), (1) tag recommendation task is to predict the missing values in rj=[rij,r2j,...,rIjl.Note that we focus on tag recom- where cii reflects the confidence of rij: mendation for articles (papers)in this paper.However,our models are flexible enough to be applied in other applications 0. if rij=1, C= such as image and video tagging because we can also repre- b if rij =0, sent the image and video content as bag-of-words with a and b being tuning parameters and a>b>0. The content base methods use only the content information for recommendation.For example,if we want to recommend We can adopt the maximum a posteriori(MAP)estimation tags for item wi,we can use the tags from the nearest neigh- to learn the parameters of CTR.The details can be found in bor in W based on the content similarity.We can also treat [Wang and Blei,2011]. each tag as a label and use multi-label methods to train clas- It is easy to see that the above process integrates matrix fac- sifiers based on content information. torization (MF)[Koren et al.,2009]based CF (Equation (1)) Co-occurrence based methods use only the item-tag matrix for tagging information and topic modeling for item content R for recommendation.For example,if ti and t occur si- information into the same principled framework. multaneously in many items'tags and ti is associated with wi,we should also recommend ti to wi.It is easy to see Collaborative Topic Regression with Social that the underlying principle of co-occurrence based methods Regularization is similar to that of collaborative filtering [Adomavicius and Tuzhilin.2005]. By extending CTR,we propose a novel hierarchical Bayesian Both content based methods and co-occurrence based model,called CTR with social regularization(CTR-SR),to methods discard some useful information.Hence,they can seamlessly integrate the item-tag matrix,item content infor- not achieve satisfactory performance in real applications mation,and social networks between items into the same principled model.The graphical model of CTR-SR is shown For ease of presentation,we use tag-item matrix and item-tag in Figure 2. matrix interchangeably in this paper. The generative process of CTR-SR is listed as follows:
In this paper, we propose some novel methods to solve the above challenges. The main contributions of this paper can be outlined as follows: • We adapt the collaborative topic regression (CTR) model [Wang and Blei, 2011], which has been successfully applied for article recommendation, to combine both item-tag matrix and item content information for tag recommendation in a principled way. • By extending CTR, we propose a novel hierarchical Bayesian model, called CTR with social regularization (CTR-SR), to seamlessly integrate the item-tag matrix, item content information, and social networks between items into the same principled model. • Extensive experiments on real-world data sets show that CTR can outperform the baselines which use only one kind of information, either item-tag matrix or item content information. Furthermore, CTR-SR can effectively utilize the social networks between items to further improve the performance. 2 Problem Statement Assume we have a set of items W = [w1, w2, · · · , wJ ] to be tagged, where wj ∈ R d denotes the content (attributes) of item j. For example, if we want to tag articles (papers) in CiteULike, the items are papers, and the content information can be the bag-of-word representation of paper abstract. Assume there are I tags {t1, t2, · · · , tI } which are candidates to be recommended to tag each item. Then we can use a tagitem matrix4 R = [rij ]I×J to represent the tagging information for all the items. rij is a binary variable, where rij = 1 means that the tag ti is associated with item wj . Otherwise, rij = 0 means that tag ti is not associated with item wj . The tag recommendation task is to predict the missing values in rj = [r1j , r2j , · · · , rIj ] T . Note that we focus on tag recommendation for articles (papers) in this paper. However, our models are flexible enough to be applied in other applications such as image and video tagging because we can also represent the image and video content as bag-of-words. The content base methods use only the content information for recommendation. For example, if we want to recommend tags for item wj , we can use the tags from the nearest neighbor in W based on the content similarity. We can also treat each tag as a label and use multi-label methods to train classifiers based on content information. Co-occurrence based methods use only the item-tag matrix R for recommendation. For example, if ti and tk occur simultaneously in many items’ tags and ti is associated with wj , we should also recommend tk to wj . It is easy to see that the underlying principle of co-occurrence based methods is similar to that of collaborative filtering [Adomavicius and Tuzhilin, 2005]. Both content based methods and co-occurrence based methods discard some useful information. Hence, they can not achieve satisfactory performance in real applications. 4 For ease of presentation, we use tag-item matrix and item-tag matrix interchangeably in this paper. 3 Collaborative Topic Regression Collaborative topic regression (CTR) [Wang and Blei, 2011] combines CF and latent Dirichlet allocation (LDA) [Blei et al., 2003] to perform recommendation. CTR is initially proposed to recommend articles (papers) to users by utilizing both user-article rating information and article content information. In this paper, we adapt CTR to our tag recommendation problem to seamlessly integrate both item-tag matrix information and item content information. For ease of presentation, we use similar graphical models and notations as those in CTR [Wang and Blei, 2011] for our problem formulation. The graphical model of CTR is illustrated in Figure 1. Assume there are K topics β = β1:K. The generative process of CTR for tag recommendation is listed as follows: 1. Draw tag latent vector for each tag i: ui ∼ N (0, λ−1 u IK), where N (·) denotes the normal distribution, IK is an identity matrix with K rows and columns. 2. For each item j, (a) Draw topic proportions θj ∼ Dirichlet(α). (b) Draw item latent offset j ∼ N (0, λ−1 v IK) and then set the item latent vector to be: vj = j + θj . (c) For each word wjn of item (paper) wj , i. Draw topic assignment zjn ∼ Mult(θj ). ii. Draw word wjn ∼ Mult(βzjn ). 3. Draw the tagging information rij for each tag-item pair (i, j), rij ∼ N (u T i vj , c−1 ij ), (1) where cij reflects the confidence of rij : cij = a, if rij = 1, b, if rij = 0, with a and b being tuning parameters and a > b > 0. We can adopt the maximum a posteriori (MAP) estimation to learn the parameters of CTR. The details can be found in [Wang and Blei, 2011]. It is easy to see that the above process integrates matrix factorization (MF) [Koren et al., 2009] based CF (Equation (1)) for tagging information and topic modeling for item content information into the same principled framework. 4 Collaborative Topic Regression with Social Regularization By extending CTR, we propose a novel hierarchical Bayesian model, called CTR with social regularization (CTR-SR), to seamlessly integrate the item-tag matrix, item content information, and social networks between items into the same principled model. The graphical model of CTR-SR is shown in Figure 2. The generative process of CTR-SR is listed as follows:
Figure 1:The graphical model of collaborative topic regres- sion(CTR). Figure 2:The graphical model of collaborative topic regres- 1.Draw tag latent vector for each tag ti: sion with social regularization(CTR-SR). ~N(0,λ1IK): 2.For each item j, As shown in(2)and Figure 2.the social network informa- (a)Draw topic proportions;Dirichlet(a). tion is seamlessly integrated into the CTR-SR by putting the (b)For each word win of item (paper)wi, Laplacian of the adjacency matrix into the prior distribution i.Draw topic assignment zin~Mult(). for S.The physical meaning is to make the latent factors(sj ii.Draw word wjn ~Mult(B). and v)of linked items as close as possible,which will be 3.Draw the social latent matrix S=[s1,s2,..,sJ]from discussed in detail in the following content. Since it is obviously intractable to compute the full poste- a matrix variate normal distribution [Gupta and Nagar, 2000: rior of ui.vj,sj,and j,an EM-style algorithm is developed to learn the maximum a posteriori (MAP)estimation.We S~NKJ(0,IK⑧()-1). (2) can maximize the posterior by maximizing the complete log- 4.Draw the item latent vector for item j from the product likelihood of U=u1,u2,…,ul,V=[,2,…,vl,S, of two Gaussians(PoG)[Gales and Airey,2006]: 0:J,and R given入u,入v,入r,A and B, vjPoG(0j:sj;AgIK;ArIK). (3) 5.Draw the tagging information rij for each tag-item pair =-当(%s)-∑--) (i,), rij ~N(ufvj c). 告∑4-∑-r-) In the above generative process,S denotes the social latent matrix of size KxJ,each column of which is the social latent vector sj for item j,NK.(0,Ik(Aa)-1)in(2)denotes a +∑∑g0t)-∑学w-喝gP matrix variate normal distribution [Gupta and Nagar,2000]: A constant is omitted and the parameter of the topic mod- p(S)=NK,J(0,IK⑧(X生a)1) el a is set to 1 as that in CTR.Note that the first ter- expitr-S.aST]} m -tr(S.asT)corresponds to log p(S)with a constant (2JK2IK 2X1S-KB2 (4) omitted and where the operator denotes the Kronecker product of two matrices [Gupta and Nagar,2000],tr()denotes the trace of (6 a matrix,is the Laplacian matrix incorporating the social j=1'= network information.=D-A where D is a diagonal K matrix whose diagonal elements Di=>;Aij.Here A is the adjacency matrix of the social networks with binary en- ==1 k=1 tries indicating the links(relations)between items.Aj=1 indicates that there is a link between item j and item i.Oth- erwise,Ajj=0.PoG(j,sj,IK,AIk)in (3)denotes =2∑1∑∑A(5-5A the product of the Gaussian N(,IK)and the Gaussian k=1j=1j'=1 N(si,A,IK),which is also a Gaussian [Gales and Airey, 2006].The resulting Gaussian is N(r,IK)with =∑Sa5k, Bor =he+shr 入w+入x where Sr+denotes the rth row of S and S.c denotes the cth column of S.We can see that maximizing-tr(sTaS) 入u入x 入m=X+ will make sj close to s;if item j and item j'are linked (A=1)
v k K u J I v r z w u Figure 1: The graphical model of collaborative topic regression (CTR). 1. Draw tag latent vector for each tag ti : ui ∼ N (0, λ−1 u IK). 2. For each item j, (a) Draw topic proportions θj ∼ Dirichlet(α). (b) For each word wjn of item (paper) wj , i. Draw topic assignment zjn ∼ Mult(θj ). ii. Draw word wjn ∼ Mult(βzjn ). 3. Draw the social latent matrix S = [s1, s2, · · · , sJ ] from a matrix variate normal distribution [Gupta and Nagar, 2000]: S ∼ NK,J (0, IK ⊗ (λlLa) −1 ). (2) 4. Draw the item latent vector for item j from the product of two Gaussians (PoG) [Gales and Airey, 2006]: vj ∼ PoG(θj , sj , λ−1 v IK, λ−1 r IK). (3) 5. Draw the tagging information rij for each tag-item pair (i, j), rij ∼ N (u T i vj , c−1 ij ). In the above generative process, S denotes the social latent matrix of size K×J, each column of which is the social latent vector sj for item j, NK,J (0, IK⊗(λlLa) −1 )in (2) denotes a matrix variate normal distribution [Gupta and Nagar, 2000]: p(S) = NK,J (0, IK ⊗ (λlLa) −1 ) = exp{tr[− λl 2 SLaS T ]} (2π) JK/2|IK| J/2|λlLa|−K/2 , (4) where the operator ⊗ denotes the Kronecker product of two matrices [Gupta and Nagar, 2000], tr(·) denotes the trace of a matrix, La is the Laplacian matrix incorporating the social network information. La = D − A where D is a diagonal matrix whose diagonal elements Dii = P j Aij . Here A is the adjacency matrix of the social networks with binary entries indicating the links (relations) between items. Ajj0 = 1 indicates that there is a link between item j and item j 0 . Otherwise, Ajj0 = 0. PoG(θj , sj , λ−1 v IK, λ−1 r IK) in (3) denotes the product of the Gaussian N (θj , λ−1 v IK) and the Gaussian N (sj , λ−1 r IK), which is also a Gaussian [Gales and Airey, 2006]. The resulting Gaussian is N (µvr, λ−1 vr IK) with µvr = θjλv + sjλr λv + λr , λvr = λvλr λv + λr . v k K u J I v r z w u r l sA Figure 2: The graphical model of collaborative topic regression with social regularization (CTR-SR). As shown in (2) and Figure 2, the social network information is seamlessly integrated into the CTR-SR by putting the Laplacian of the adjacency matrix into the prior distribution for S. The physical meaning is to make the latent factors (sj and vj ) of linked items as close as possible, which will be discussed in detail in the following content. Since it is obviously intractable to compute the full posterior of ui , vj , sj , and θj , an EM-style algorithm is developed to learn the maximum a posteriori (MAP) estimation. We can maximize the posterior by maximizing the complete loglikelihood of U = [u1, u2, · · · , uI ], V = [v1, v2, · · · , vJ ], S, θ1:J , and R given λu, λv, λr, λl and β, L = − λl 2 tr(SLaS T ) − λr 2 X j (sj − vj ) T (sj − vj ) (5) − λu 2 X i u T i ui − λv 2 X j (vj − θj ) T (vj − θj ) + X j X n log(X k θjkβk,wjn ) − X i,j cij 2 (rij − u T i vj ) 2 . A constant is omitted and the parameter of the topic model α is set to 1 as that in CTR. Note that the first term − λl 2 tr(SLaS T ) corresponds to log p(S) with a constant omitted and tr(SLaS T ) = 1 2 X J j=1 X J j 0=1 Ajj0 ||S∗j − S∗j 0 ||2 (6) = 1 2 X J j=1 X J j 0=1 [Ajj0 X K k=1 (Skj − Skj0 ) 2 ] = 1 2 X K k=1 [ X J j=1 X J j 0=1 Ajj0 (Skj − Skj0 ) 2 ] = X K k=1 S T k∗LaSk∗, where Sr∗ denotes the rth row of S and S∗c denotes the cth column of S. We can see that maximizing − λl 2 tr(S T LaS) will make sj close to sj 0 if item j and item j 0 are linked (Ajj0 = 1)
The function in (5)can be optimized using coordi- 5.1 Dataset nate ascent.We first fix parameters B and optimize the Two real-world datasets are used in our experiments.Both of collaborative filtering variables [ui,vj,s}and the topic them are from CiteULike3,but they are collected in different proportions 0j iteratively.The parameter B is updated every ways.The first dataset,called citeulike-a,is from [Wang and time fui,vi,si and 0;are optimized. Blei.2011].Note that there is not tag information in the o- riginal dataset of [Wang and Blei,2011].We collect the tag The update rules for ui and vi are: information from CiteULike.We collect the second dataset. u←-(VC:VT+入uIk)-1VC:R:, called citeulike-t,by ourselves.Specifically,we manually s- v←(UCUT+XIK+入IK)-1(UCR+λ0+Xrs), elect 273 seed tags and collect all the articles with at least one of these tags.Note that the final number of tags (19107 where Ci is a diagonal matrix with fcij,j=1,...,J}as its and 52946 respectively for two datasets)corresponding to al- diagonal entries and Ri is the ith row of R. I the collected articles is far more than the number of seed For social latent matrix S,we fix all rows of S except the tags (273).We remove tags used less than 5 times and get kth one Sk+and then update Sk+.After taking the gradient of 7386 and 8311 tags for citeulike-a and citeulike-t,respective- with respect to S and setting it to 0,we get the following ly.There are 16980 items (articles)and 25975 items in the linear system: datasets citeulike-a and citeulike-t,respectively.The ratios of (入n生a+入rI)Sk*=入rVk* (7) non-empty entries (equal to 1-sparsity)in the item-tag ma- trices of citeulike-a and citeulike-t are 0.00145 and 0.00104 One direct way to solve the linear system is to set Sk= respectively,which means that the second dataset is sparser Ar(Aa+ArIJ)-1Vk.However,the complexity for one than the first one. single update is O(J3)where J is the number of items.In- spired by [Li and Yeung,2009],we use the steepest descent We preprocess the text information(content of items)fol- lowing the same procedure as that in [Wang and Blei,2011]. method [Shewchuk,1994]to iteratively update Sk+: As in [Wang and Blei,2011],we also use the titles and ab- Sk+(t+1)Sx-(t)+6(t)r(t) stracts of articles as content information of citeulike-t.We r(t)←λr*-(入生a+入rIJ)Sk*(t) choose the top 20000 distinct words according to the tf-idf values as our vocabulary after removing the stop-words. r(t)r(t) 6←rer(么+AI)r同 Because citation information is not provided in CiteULike, we use the user-article information which is available in Ci- As discussed in [Li and Yeung,20091,using the steepest de- teULike to construct the social networks between items.For scent method instead of solving the linear system directly can each dataset,we construct the social network with a threshold dramatically reduce the computation cost in each iteration of 4 using the user-article matrices.More specifically,if two from O(J3)to (J). items have 4 or more users in common,they are linked in the For 0j,we first define q(zjn=k)=ink as that in CTR social network.This is meaningful because two papers with and LDA [Blei et al.,2003]and apply Jensen's inequality similar users(readers)typically have similar topics.We then after items containing are separated, merge this social network and the citation network between papers to get the final network.After network constructing, g)≥-之g-9,rg-9) (8) the numbers of links in the final networks are 294072 and 180103 for citeulike-a and citeulike-t,respectively. +∑∑pjnk(log0jkun-log9jnk) 5.2 Evaluation Scheme =(0,中) In each dataset,we randomly select P items associated with each tag to construct the training set and use all the rest of the Here中j=(jnk)NXk-.Obviously.(g,中,)isatight dataset as test set.We vary P from I to 10 in our experiments lower bound of)and we can use projection gradient to and the smaller P is,the sparser the training set is.Note optimize The optimal ink is that when P =1,only 4.1%of the tagging entries are put jnk OjkBk.wyn in training set for dataset citeulike-a and the number is 3.7% for dataset citeulike-t.For each P we repeat the evaluation As for the parameter B,we follow the same M-step update 5 times with randomly selected training set,and the average as in LDA [Blei et al..2003] performance will be reported. wx∑∑pnkln=l As in [Wang and Blei,2011]and [Marinho and Schmidt- jn Thieme,2007],we use recall as our evaluation metric since zero entries may be caused either by irrelevance between the 5 Experiments tag and the item or by users who do not know the existence We conduct experiments on two real-world data sets to CiteULike allows users to create their own collections of ar- demonstrate the effectiveness of our models.As stated in ticles.There are abstracts,titles,and tags for each article.Oth- Section 2,although our focus is on tag recommendation for er information like authors,groups,posting time,and keywords is articles(papers)in this paper,our models are general enough not used in this paper.The detailed information can be found at to model other kinds of data like image tagging. http://www.citeulike.ort/faq/data.adp
The function L in (5) can be optimized using coordinate ascent. We first fix parameters β and optimize the collaborative filtering variables {ui , vj , sj} and the topic proportions θj iteratively. The parameter β is updated every time {ui , vj , sj} and θj are optimized. The update rules for ui and vj are: ui ← (V CiV T + λuIK) −1V CiRi , vj ← (UCiU T + λvIK + λrIK) −1 (UCjRj + λvθj + λrsj ), where Ci is a diagonal matrix with {cij , j = 1, . . . , J} as its diagonal entries and Rj is the jth row of R. For social latent matrix S, we fix all rows of S except the kth one Sk∗ and then update Sk∗. After taking the gradient of L with respect to Sk∗ and setting it to 0, we get the following linear system: (λlLa + λrI)Sk∗ = λrVk∗. (7) One direct way to solve the linear system is to set Sk∗ = λr(λlLa + λrIJ ) −1Vk∗. However, the complexity for one single update is O(J 3 ) where J is the number of items. Inspired by [Li and Yeung, 2009], we use the steepest descent method [Shewchuk, 1994] to iteratively update Sk∗: Sk∗(t + 1) ← Sk∗(t) + δ(t)r(t) r(t) ← λrVk∗ − (λlLa + λrIJ )Sk∗(t) δ(t) ← r(t) T r(t) r(t) T (λlLa + λrIJ )r(t) As discussed in [Li and Yeung, 2009], using the steepest descent method instead of solving the linear system directly can dramatically reduce the computation cost in each iteration from O(J 3 ) to O(J). For θj , we first define q(zjn=k) = ψjnk as that in CTR and LDA [Blei et al., 2003] and apply Jensen’s inequality after items containing θj are separated, L (θj ) ≥ − λv 2 (vj − θj ) T (vj − θj ) (8) + X n X k φjnk(log θjkβk,wjn − log φjnk) = L (θj , φj ). Here φj = (φjnk) N×K n=1,k=1. Obviously L (θj , φj ) is a tight lower bound of L (θj ) and we can use projection gradient to optimize θj . The optimal φjnk is φjnk ∝ θjkβk,wjn . As for the parameter β, we follow the same M-step update as in LDA [Blei et al., 2003], βkw ∝ X j X n φjnk1[wjn = w]. 5 Experiments We conduct experiments on two real-world data sets to demonstrate the effectiveness of our models. As stated in Section 2, although our focus is on tag recommendation for articles (papers) in this paper, our models are general enough to model other kinds of data like image tagging. 5.1 Dataset Two real-world datasets are used in our experiments. Both of them are from CiteULike5 , but they are collected in different ways. The first dataset, called citeulike-a, is from [Wang and Blei, 2011]. Note that there is not tag information in the original dataset of [Wang and Blei, 2011]. We collect the tag information from CiteULike. We collect the second dataset, called citeulike-t, by ourselves. Specifically, we manually select 273 seed tags and collect all the articles with at least one of these tags. Note that the final number of tags (19107 and 52946 respectively for two datasets) corresponding to all the collected articles is far more than the number of seed tags (273). We remove tags used less than 5 times and get 7386 and 8311 tags for citeulike-a and citeulike-t, respectively. There are 16980 items (articles) and 25975 items in the datasets citeulike-a and citeulike-t, respectively. The ratios of non-empty entries (equal to 1-sparsity) in the item-tag matrices of citeulike-a and citeulike-t are 0.00145 and 0.00104 respectively, which means that the second dataset is sparser than the first one. We preprocess the text information (content of items) following the same procedure as that in [Wang and Blei, 2011]. As in [Wang and Blei, 2011], we also use the titles and abstracts of articles as content information of citeulike-t. We choose the top 20000 distinct words according to the tf-idf values as our vocabulary after removing the stop-words. Because citation information is not provided in CiteULike, we use the user-article information which is available in CiteULike to construct the social networks between items. For each dataset, we construct the social network with a threshold of 4 using the user-article matrices. More specifically, if two items have 4 or more users in common, they are linked in the social network. This is meaningful because two papers with similar users (readers) typically have similar topics. We then merge this social network and the citation network between papers to get the final network. After network constructing, the numbers of links in the final networks are 294072 and 180103 for citeulike-a and citeulike-t, respectively. 5.2 Evaluation Scheme In each dataset, we randomly select P items associated with each tag to construct the training set and use all the rest of the dataset as test set. We vary P from 1 to 10 in our experiments and the smaller P is, the sparser the training set is. Note that when P = 1, only 4.1% of the tagging entries are put in training set for dataset citeulike-a and the number is 3.7% for dataset citeulike-t. For each P we repeat the evaluation 5 times with randomly selected training set, and the average performance will be reported. As in [Wang and Blei, 2011] and [Marinho and SchmidtThieme, 2007], we use recall as our evaluation metric since zero entries may be caused either by irrelevance between the tag and the item or by users who do not know the existence 5CiteULike allows users to create their own collections of articles. There are abstracts, titles, and tags for each article. Other information like authors, groups, posting time, and keywords is not used in this paper. The detailed information can be found at http://www.citeulike.ort/faq/data.adp
of the tags when tagging items,which means precision is not a proper metric here.Like most recommender systems,we sort the predicted ratings of candidate tags and recommend the top M tags to the target item.For each item,recall@M is defined as number of tags the item is associated with in top M recall@M= total number of tags the item is associated with 1° The final reported result is the average of all the items'recall. (a) (b) Besides,as in [Sigurbjornsson and van Zwol,2008],we Figure 5:Sensitivity to parameters.(a)The effect of A in CTR-SR use success@M to be another evaluation metric.success@M (b)The effect of Ar in CTR-SR. is defined as the probability of finding a true tag among the cases.Similar phenomena are observed for other values of top M recommended tags. P,which are omitted due to space constraint. 5.3 Baselines and Experimental Settings 5.5 Sensitivity to Parameters We use the following baselines for comparison: Figure 5(a)shows how the prediction performance of TAGCO:This method belongs to the category of co- CTR-SR is affected by the parameter A.P is set to 5, occurrence based methods,which is described in [Sig- λw=l0,λu=0.l,andr=100.As we can see,the urbjornsson and van Zwol,2008]. prediction performance first increases with A and starts to s- .SCF:This is a similarity-based collaborative filter- lightly decrease at some point after A=10 for all values of ing Marinho and Schmidt-Thieme,2007].It finds k M.It is not too sensitive in a large range of values. nearest neighbors of the paper's existing tags and rec- Figure 5(b)demonstrates the sensitivity of CTR-SR to pa- ommends other tags according to its neighbors'tags.It rameter入r.In this experiment,.P is also set to5and入,=l0, only uses the item-tag matrix information Au=0.1,and A=10.As the figure shows,the performance CF:This is a matrix factorization based collaborative first increases with A,and begins to decrease at some point filtering [Koren et al.,2009]method.It factorizes the afterr =100 for all values of M.It is also not too sensitive training matrix into two low-rank matrices U,V,and in a large range of values recovers the original matrix by UVT.It only uses the item-tag matrix information. 5.6 Interpretability SCF+LDA:This method integrates similarity-based col- Besides promising prediction performance,our proposed laborative filtering with LDA.It falls into the category of model can also provide a very good interpretation.Two ex- hybrid methods and is adapted from [Sevil et al.,2010]. ample articles (items)with their top 2 topics are presented in Table 1.Note that although the learned topic proportions of CTR:The method introduced in Section 3. CTR are different from those of CTR-SR,the ranking of top We use a validation set to find the optimal parameters. 2 topics are the same.In this case study,CTR-SR and CTR More specifically,we find that CTR achieves good predic- are trained using the extremely sparse training data(P=1) tion performance when=10,Au =0.1,a =1,6 =0.01, and recommend tags to articles.Note that in the training da- and K 200.For CF,the parameters are =1,A=1, ta,each tag is associated with only one single article,which a =1,b =0.01,and K 200.For CTR-SR,the parameters makes tag recommendation very challenging.As we can see are入m=10,入u=0.1,a=1,b=0.01,K=200,入,=100 from Table 1,for Article I,precisions of the top 10 tags for and入,=10. CTR-SR and CTR are 50%and 10%,respectively.For Arti- cle II.the precisions are 60%and 10%.respectively.We can 5.4 Performance find that the social network information among items are very Figure 3(a)and Figure 4 (a)show the recall@50 when P informative and our CTR-SR model can effectively exploit it. is set to be 1,2,5,8,10,on citeulike-a and citeulike-t,re- When examining more closely,we can find that Article I spectively.The random baselines are 0.68%and 0.60%re- 'How much can behavioral targeting help online advertis- spectively.As we can see,the hybrid method SCF+LDA out- ing?'is about online advertising,which can also be verified performs those methods use only one kind of information, by the true tags shown in the table.As we can see,the rec- and CTR outperforms SCF+LDA.Furthermore.our CTR-SR ommended tags by CTR focuses more on the technical details model achieves the best performance for most cases by effec- while those returned by CTR-SR are closer to the essence of tively integrating the social networks between items. the articles.Similarly,Article II'Lowcost multitouch sensing Figure 3(b)and Figure 4(b)show the recall of all the meth- through frustrated total internal refection'focuses on mul- ods when P is fixed to be 5 by setting M=2,5.10.20.50 in titouch sensing.Tags recommended by CTR like 'nanopar- dataset citeulike-a and citeulike-t.Figure 3 (c)and Figure 4 ticles','dna-nanotechnology',and 'gamma'seem a lot more (c)show the success@M of all the methods when P is fixed technical and achieve a low precision of 10%.On the con- to be 5 by setting M=2,5,10,20,50 in two datasets.Once trary,tags recommended by CTR-SR like 'multi-touch'and again,we can see that CTR outperforms other baselines and 'screen'can better describe the main points of the article and CTR-SR is significantly better than other methods for most achieve a high precision of 60%
of the tags when tagging items, which means precision is not a proper metric here. Like most recommender systems, we sort the predicted ratings of candidate tags and recommend the top M tags to the target item. For each item, recall@M is defined as recall@M = number of tags the item is associated with in top M total number of tags the item is associated with . The final reported result is the average of all the items’ recall. Besides, as in [Sigurbjornsson and van Zwol, 2008 ¨ ], we use success@M to be another evaluation metric. success@M is defined as the probability of finding a true tag among the top M recommended tags. 5.3 Baselines and Experimental Settings We use the following baselines for comparison: • TAGCO: This method belongs to the category of cooccurrence based methods, which is described in [Sigurbjornsson and van Zwol, 2008 ¨ ]. • SCF: This is a similarity-based collaborative filtering [Marinho and Schmidt-Thieme, 2007]. It finds k nearest neighbors of the paper’s existing tags and recommends other tags according to its neighbors’ tags. It only uses the item-tag matrix information. • CF: This is a matrix factorization based collaborative filtering [Koren et al., 2009] method. It factorizes the training matrix into two low-rank matrices U, V , and recovers the original matrix by UV T . It only uses the item-tag matrix information. • SCF+LDA: This method integrates similarity-based collaborative filtering with LDA. It falls into the category of hybrid methods and is adapted from [Sevil et al., 2010]. • CTR: The method introduced in Section 3. We use a validation set to find the optimal parameters. More specifically, we find that CTR achieves good prediction performance when λv = 10, λu = 0.1, a = 1, b = 0.01, and K = 200. For CF, the parameters are λv = 1, λu = 1, a = 1, b = 0.01, and K = 200. For CTR-SR, the parameters are λv = 10, λu = 0.1, a = 1, b = 0.01, K = 200, λr = 100 and λl = 10. 5.4 Performance Figure 3 (a) and Figure 4 (a) show the recall@50 when P is set to be 1, 2, 5, 8, 10, on citeulike-a and citeulike-t, respectively. The random baselines are 0.68% and 0.60% respectively. As we can see, the hybrid method SCF+LDA outperforms those methods use only one kind of information, and CTR outperforms SCF+LDA. Furthermore, our CTR-SR model achieves the best performance for most cases by effectively integrating the social networks between items. Figure 3 (b) and Figure 4 (b) show the recall of all the methods when P is fixed to be 5 by setting M=2, 5, 10, 20, 50 in dataset citeulike-a and citeulike-t. Figure 3 (c) and Figure 4 (c) show the success@M of all the methods when P is fixed to be 5 by setting M=2, 5, 10, 20, 50 in two datasets. Once again, we can see that CTR outperforms other baselines and CTR-SR is significantly better than other methods for most 10−1 100 101 102 103 0.05 0.1 0.15 0.2 0.25 λl Recall M=50 M=20 M=10 (a) 10−1 100 101 102 103 0.05 0.1 0.15 0.2 0.25 λr Recall M=50 M=20 M=10 (b) Figure 5: Sensitivity to parameters. (a) The effect of λl in CTR-SR. (b) The effect of λr in CTR-SR. cases. Similar phenomena are observed for other values of P, which are omitted due to space constraint. 5.5 Sensitivity to Parameters Figure 5 (a) shows how the prediction performance of CTR-SR is affected by the parameter λl . P is set to 5, λv = 10, λu = 0.1, and λr = 100. As we can see, the prediction performance first increases with λl and starts to slightly decrease at some point after λl = 10 for all values of M. It is not too sensitive in a large range of values. Figure 5 (b) demonstrates the sensitivity of CTR-SR to parameter λr. In this experiment, P is also set to 5 and λv = 10, λu = 0.1, and λl = 10. As the figure shows, the performance first increases with λr and begins to decrease at some point after λr = 100 for all values of M. It is also not too sensitive in a large range of values. 5.6 Interpretability Besides promising prediction performance, our proposed model can also provide a very good interpretation. Two example articles (items) with their top 2 topics are presented in Table 1. Note that although the learned topic proportions of CTR are different from those of CTR-SR, the ranking of top 2 topics are the same. In this case study, CTR-SR and CTR are trained using the extremely sparse training data (P = 1) and recommend tags to articles. Note that in the training data, each tag is associated with only one single article, which makes tag recommendation very challenging. As we can see from Table 1, for Article I, precisions of the top 10 tags for CTR-SR and CTR are 50% and 10%, respectively. For Article II, the precisions are 60% and 10%, respectively. We can find that the social network information among items are very informative and our CTR-SR model can effectively exploit it. When examining more closely, we can find that Article I ‘How much can behavioral targeting help online advertising?’ is about online advertising, which can also be verified by the true tags shown in the table. As we can see, the recommended tags by CTR focuses more on the technical details while those returned by CTR-SR are closer to the essence of the articles. Similarly, Article II ’Lowcost multitouch sensing through frustrated total internal reflection’ focuses on multitouch sensing. Tags recommended by CTR like ‘nanoparticles’, ‘dna-nanotechnology’, and ‘gamma’ seem a lot more technical and achieve a low precision of 10%. On the contrary, tags recommended by CTR-SR like ‘multi-touch’ and ‘screen’ can better describe the main points of the article and achieve a high precision of 60%
a a (a) (b) (c) Figure 3:Experimental results on dataset citeulike-a.(a)shows the recall@50 of all the methods.(b)shows the recall of all the methods when P=5 and M ranges from 2 to 50.(c)shows the success@M of all the methods when P=5 and M ranges from 2 to 50. CTR-5R a 0动”08 (a) (b) (c) Figure 4:Experimental results on dataset citeulike-t.(a)shows the recall@50 of all the methods.(b)shows the recall of all the methods when P=5 and M ranges from 2 to 50.(c)shows the success@M of all the methods when P=5 and M ranges from 2 to 50. Table 1:Example Articles with Recommended Tags Title:How much can behavioral targeting help online advertising? Article I Top topic 1:web,search.engine,pages,keyword,click,hypertext,html,searchers,crawler Top topic 2:mobile,phones, attitudes,advertising.consum marketing.commerce,sms,m-leaming True tags:bchavioral targeting,advertising,ads,computational advertising.recommend,user-behavior,user profle CIr True tag?CTR-SR True tag? 1.random-walks no 1.behavioral targeting yes 2.page-rank no ads ves 3.computational advertising yes 3.computational advertising yes 4.citizen-science no 4.random-walks n Top 10 recommended tags 5.natural history no 5. page-rank no 6.search engine no 6.developing no no 7.recommend yes no 8.advertising yes 9 what no 9.what n 10.re-ranking no 10.need no Title:Lowcost multitouch sensing through frustrated total internal reflection ArticleⅡ Top topic I:molecular,molecules,surface,chemical,formation,forces,reaction,shapes,sensing.kinctics Top topic 2:design,interface,principles,interfaces,interactive,devices,usability,application True tags:tech,screen,gestures,touch,interface,multitouch,multi-touch,table,visualization,computer_vision CIR True tag? CTR-SR True tag? 1.guide no L touch yes 2.gamma 2.field 3.optical no 3.cestures ves 4.nanoparticles no 4.table es Top 10 recommended tags 5.nano no 5.multi-touch ye 6.dna-nanotecnology no 6 screen yes 7. no 7.multitouch yes 8.sm5 no 8.dna-nanotecnology no 9.touch yes 9. nano no 10.field no 10.superlist no 6 Conclusion world datasets successfully demonstrate the effectiveness of In this paper,we first adapt CTR to combine both item-tag our proposed models. matrix and item content information for tag recommenda- tion.Furthermore,by extending CTR we propose a novel 7 Acknowledgements hierarchical Bayesian model,called CTR with social regular- This work is supported by the NSFC (No.61100125),the ization (CTR-SR),to seamlessly integrate the item-tag ma- 863 Program of China(No.2012AA011003),and the Pro- trix.item content information,and social networks between gram for Changjiang Scholars and Innovative Research Team items into the same principled model.Experiments on real- in University of China(IRT1158,PCSIRT)
1 2 3 4 5 6 7 8 9 10 0.05 0.1 0.15 0.2 0.25 0.3 P Recall SCF SCF+LDA CF TAGCO CTR CTR−SR (a) 5 10 15 20 25 30 35 40 45 50 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 M Recall SCF SCF+LDA CF TAGCO CTR CTR−SR (b) 5 10 15 20 25 30 35 40 45 50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 M Success SCF SCF+LDA CF TAGCO CTR CTR−SR (c) Figure 3: Experimental results on dataset citeulike-a. (a) shows the recall@50 of all the methods. (b) shows the recall of all the methods when P = 5 and M ranges from 2 to 50. (c) shows the success@M of all the methods when P = 5 and M ranges from 2 to 50. 1 2 3 4 5 6 7 8 9 10 0.05 0.1 0.15 0.2 0.25 0.3 P Recall SCF SCF+LDA CF TAGCO CTR CTR−SR (a) 5 10 15 20 25 30 35 40 45 50 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 M Recall SCF SCF+LDA CF TAGCO CTR CTR−SR (b) 5 10 15 20 25 30 35 40 45 50 0.1 0.2 0.3 0.4 0.5 0.6 M Success SCF SCF+LDA CF TAGCO CTR CTR−SR (c) Figure 4: Experimental results on dataset citeulike-t. (a) shows the recall@50 of all the methods. (b) shows the recall of all the methods when P = 5 and M ranges from 2 to 50. (c) shows the success@M of all the methods when P = 5 and M ranges from 2 to 50. Table 1: Example Articles with Recommended Tags Article I Title: How much can behavioral targeting help online advertising? Top topic 1: web, search, engine, pages, keyword, click, hypertext, html, searchers, crawler Top topic 2: mobile, phones, attitudes, advertising, consumer, marketing, commerce, sms, m-learning True tags: behavioral targeting, advertising, ads, computational advertising, recommend, user-behavior, user profile Top 10 recommended tags CTR True tag? CTR-SR True tag? 1. random-walks no 1. behavioral targeting yes 2. page-rank no 2. ads yes 3. computational advertising yes 3. computational advertising yes 4. citizen-science no 4. random-walks no 5. natural history no 5. page-rank no 6. search engine no 6. developing no 7. engine no 7. recommend yes 8. searchengine no 8. advertising yes 9. what no 9. what no 10. re-ranking no 10. need no Article II Title: Lowcost multitouch sensing through frustrated total internal reflection Top topic 1: molecular, molecules, surface, chemical, formation, forces, reaction, shapes, sensing, kinetics Top topic 2: design, interface, principles, interfaces, interactive, devices, usability, application True tags: tech, screen, gestures, touch, interface, multitouch, multi-touch, table, visualization, computer vision Top 10 recommended tags CTR True tag? CTR-SR True tag? 1. guide no 1. touch yes 2. gamma no 2. field no 3. optical no 3. gestures yes 4. nanoparticles no 4. table yes 5. nano no 5. multi-touch yes 6. dna-nanotecnology no 6. screen yes 7. tirf no 7. multitouch yes 8. sms no 8. dna-nanotecnology no 9. touch yes 9. nano no 10. field no 10. superlist no 6 Conclusion In this paper, we first adapt CTR to combine both item-tag matrix and item content information for tag recommendation. Furthermore, by extending CTR we propose a novel hierarchical Bayesian model, called CTR with social regularization (CTR-SR), to seamlessly integrate the item-tag matrix, item content information, and social networks between items into the same principled model. Experiments on realworld datasets successfully demonstrate the effectiveness of our proposed models. 7 Acknowledgements This work is supported by the NSFC (No. 61100125), the 863 Program of China (No. 2012AA011003), and the Program for Changjiang Scholars and Innovative Research Team in University of China (IRT1158, PCSIRT)
References Lipczak et al..2009 Marek Lipczak.Yeming Hu,Yael Kollet,and Evangelos Milios.Tag sources for recommendation in collabora- [Adomavicius and Tuzhilin,2005]G. Adomavicius and A.Tuzhilin.Toward the next generation of recommender tive tagging systems.In ECML PKDD Discovery Challenge 2009 (DC09),volume 497 of CEUR-WS.org,pages 157-172,Septem- systems:a survey of the state-of-the-art and possible extensions. ber2009. IEEE Transactions on Knowledge and Data Engineering, 17(6):734-749.2005. [Lops er al.,2011]Pasquale Lops,Marco de Gemmis,and Giovan- ni Semeraro.Content-based recommender systems:State of the [Benz et al..2006]Dominik Benz.Karen H.L.Tso,and Lars art and trends.In Francesco Ricci,Lior Rokach,Bracha Shapira, Schmidt-Thieme.Automatic bookmark classification-a collab- and Paul B.Kantor,editors.Recommender Systems Handbook. orative approach.In Proceedings of the 2nd Workshop in Inno- pages 73-105.Springer,2011. vations in Web Infrastructure (IW12)at WWW2006.Edinburgh. Scotland.May 2006. Lops et al.,2013]Pasquale Lops,Marco de Gemmis.Giovanni Semeraro,Cataldo Musto,and Fedelucio Narducci.Content- Blei et al..2003]David M.Blei.Andrew Y.Ng.and Michael I. based and collaborative techniques for tag recommendation:an Jordan.Latent dirichlet allocation.Journal of Machine Learning empirical evaluation.J.Intell.Inf.Syst.,40(1):41-61,2013. Research.3:993-1022.2003. [Marinho and Schmidt-Thieme,2007]Leandro Balby Marinho and [Chen et al.,2008]Hong-Ming Chen,Ming-Hsiu Chang.Ping- Lars Schmidt-Thieme.Collaborative tag recommendations.In Chieh Chang.Min-Chun Tien.Winston H.Hsu,and Ja-Ling Wu GFKL,pages533-540,2007. Sheepdog:group and tag recommendation for flickr photos by [Rendle and Schmidt-Thieme,2010]Steffen Rendle and Lars automatic search-based learning.In ACM Multimedia,pages Schmidt-Thieme.Pairwise interaction tensor factorization for 737-740.2008 personalized tag recommendation.In WSDM,pages 81-90, [Chen et al.,2010]Lin Chen,Dong Xu,Ivor Wai-Hung Tsang,and 2010. Jiebo Luo.Tag-based web photo retrieval improved by batch [Sevil et al,2010]Sare Gul Sevil,Onur Kucuktunc,Pinar Duygu- mode re-tagging.In CVPR,pages 3440-3446,2010. lu,and Fazli Can.Automatic tag expansion using visual similari- Gales and Airey,2006]M.J.F.Gales and S.S.Airey.Product of ty for photo sharing websites.Multimedia Tools Appl.,49(1):81- gaussians for speech recognition.CSL,20(1):22-40,2006. 99.2010. [Shen and Fan,2010]Yi Shen and Jianping Fan.Leveraging Garg and Weber,2008]Nikhil Garg and Ingmar Weber.Personal- loosely-tagged images and inter-object correlations for tag rec- ized,interactive tag recommendation for flickr.In RecSys,pages ommendation.In ACM Multimedia,pages 5-14,2010. 67-74.2008. Shewchuk,1994]Jonathan R Shewchuk.An introduction to the [Gupta and Nagar,2000]A.K.Gupta and D.K.Nagar.Matrix Vari- conjugate gradient method without the agonizing pain.Technical ate Distributions.Chapman Hall/CRC Monographs and Sur- report,Carnegie Mellon University,Pittsburgh,PA,USA,1994. veys in Pure and Applied Mathematics.Chapman Hall,2000. [Sigurbjornsson and van Zwol,2008]Borkur Sigurbjornsson and Gupta et al.,2010]Manish Gupta,Rui Li,Zhijun Yin,and Jiawei Roelof van Zwol.Flickr tag recommendation based on collec- Han.Survey on social tagging techniques.SIGKDD Explo- tive knowledge.In WWW,pages 327-336,2008. rations.12(1):58-72,2010. [Toderici et al.,2010]George Toderici,Hrishikesh Aradhye,Mar- [Hotho et al.,2006]Andreas Hotho,Robert Jaschke,Christoph ius Pasca,Luciano Sbaiz,and Jay Yagnik.Finding meaning on Schmitz,and Gerd Stumme.Information retrieval in folk- youtube:Tag recommendation and category discovery.In CVPR, sonomies:Search and ranking.In ESWC,pages 411-426,2006. pages3447-3454,2010. [Koren et al..2009]Yehuda Koren,Robert M.Bell,and Chris [Wang and Blei,2011]Chong Wang and David M.Blei.Collab- Volinsky.Matrix factorization techniques for recommender sys- orative topic modeling for recommending scientific articles.In tems.IEEE Computer,42(8):30-37.2009 KDD,pages448-456,2011. Wang and Li,2013]Hao Wang and Wu-Jun Li.Online egocentric Lee et al.,2010]Sihyoung Lee,Wesley De Neve,Konstantinos N. models for citation networks.In I/CAl.2013. Plataniotis,and Yong Man Ro.Map-based image tag recommen- dation using a visual folksonomy.Pattern Recognition Letters, [Wang et al.,2012]Meng Wang,Bingbing Ni,Xian-Sheng Hua, 31(9):976-982,2010 and Tat-Seng Chua.Assistive tagging:A survey of multimedia tagging with human-computer joint exploration.ACM Comput. [Li and Yeung.2009]Wu-Jun Li and Dit-Yan Yeung.Relation reg- Sun,44(4):25.2012. ularized matrix factorization.In ICAl,pages 1126-1131,2009. [Weinberger et al.,2008]Kilian Q.Weinberger,Malcolm Slaney, [Li and Yeung,2011]Wu-Jun Li and Dit-Yan Yeung.Social rela- and Roelof van Zwol.Resolving tag ambiguity.In ACM Mul- tions model for collaborative filtering.In AAAl.2011. timedia,pages 111-120,2008. [Li et al.,2009a]Wu-Jun Li,Dit-Yan Yeung,and Zhihua Zhang. [Wu et al.,2009]Lei Wu,Linjun Yang,Nenghai Yu,and Xian- Probabilistic relational PCA.In NIPS,pages 1123-1131,2009. Sheng Hua.Learning to tag.In WWW,pages 361-370,2009. [Li et al.,2009b]Wu-Jun Li,Zhihua Zhang,and Dit-Yan Yeung. [Xu et al.,2006]Z.Xu,Y.Fu,J.Mao,and D.Su.Towards the Latent wishart processes for relational kernel learning.Journal semantic web:Collaborative tag suggestions.In Proceedings of Machine Learning Research-Proceedings Track,5:336-343. of the Collaborative Web Tagging Workshop at the WWW 2006, 2009. Edinburgh,Scotland,2006. [Li et al.,2011]Wu-Jun Li,Dit-Yan Yeung,and Zhihua Zhang. [Zhen et al.,2009]Yi Zhen,Wu-Jun Li,and Dit-Yan Yeung.Tagi- Generalized latent factor models for social network analysis.In CoFi:tag informed collaborative filtering.In RecSys,pages 69- 76.2009. IJCAI,pages1705-1710,2011
References [Adomavicius and Tuzhilin, 2005] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6):734–749, 2005. [Benz et al., 2006] Dominik Benz, Karen H. L. Tso, and Lars Schmidt-Thieme. Automatic bookmark classification - a collaborative approach. In Proceedings of the 2nd Workshop in Innovations in Web Infrastructure (IWI2) at WWW2006, Edinburgh, Scotland, May 2006. [Blei et al., 2003] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003. [Chen et al., 2008] Hong-Ming Chen, Ming-Hsiu Chang, PingChieh Chang, Min-Chun Tien, Winston H. Hsu, and Ja-Ling Wu. Sheepdog: group and tag recommendation for flickr photos by automatic search-based learning. In ACM Multimedia, pages 737–740, 2008. [Chen et al., 2010] Lin Chen, Dong Xu, Ivor Wai-Hung Tsang, and Jiebo Luo. Tag-based web photo retrieval improved by batch mode re-tagging. In CVPR, pages 3440–3446, 2010. [Gales and Airey, 2006] M. J. F. Gales and S. S. Airey. Product of gaussians for speech recognition. CSL, 20(1):22–40, 2006. [Garg and Weber, 2008] Nikhil Garg and Ingmar Weber. Personalized, interactive tag recommendation for flickr. In RecSys, pages 67–74, 2008. [Gupta and Nagar, 2000] A.K. Gupta and D.K. Nagar. Matrix Variate Distributions. Chapman & Hall/CRC Monographs and Surveys in Pure and Applied Mathematics. Chapman & Hall, 2000. [Gupta et al., 2010] Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han. Survey on social tagging techniques. SIGKDD Explorations, 12(1):58–72, 2010. [Hotho et al., 2006] Andreas Hotho, Robert Jaschke, Christoph ¨ Schmitz, and Gerd Stumme. Information retrieval in folksonomies: Search and ranking. In ESWC, pages 411–426, 2006. [Koren et al., 2009] Yehuda Koren, Robert M. Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30–37, 2009. [Lee et al., 2010] Sihyoung Lee, Wesley De Neve, Konstantinos N. Plataniotis, and Yong Man Ro. Map-based image tag recommendation using a visual folksonomy. Pattern Recognition Letters, 31(9):976–982, 2010. [Li and Yeung, 2009] Wu-Jun Li and Dit-Yan Yeung. Relation regularized matrix factorization. In IJCAI, pages 1126–1131, 2009. [Li and Yeung, 2011] Wu-Jun Li and Dit-Yan Yeung. Social relations model for collaborative filtering. In AAAI, 2011. [Li et al., 2009a] Wu-Jun Li, Dit-Yan Yeung, and Zhihua Zhang. Probabilistic relational PCA. In NIPS, pages 1123–1131, 2009. [Li et al., 2009b] Wu-Jun Li, Zhihua Zhang, and Dit-Yan Yeung. Latent wishart processes for relational kernel learning. Journal of Machine Learning Research - Proceedings Track, 5:336–343, 2009. [Li et al., 2011] Wu-Jun Li, Dit-Yan Yeung, and Zhihua Zhang. Generalized latent factor models for social network analysis. In IJCAI, pages 1705–1710, 2011. [Lipczak et al., 2009] Marek Lipczak, Yeming Hu, Yael Kollet, and Evangelos Milios. Tag sources for recommendation in collaborative tagging systems. In ECML PKDD Discovery Challenge 2009 (DC09), volume 497 of CEUR-WS.org, pages 157–172, September 2009. [Lops et al., 2011] Pasquale Lops, Marco de Gemmis, and Giovanni Semeraro. Content-based recommender systems: State of the art and trends. In Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, editors, Recommender Systems Handbook, pages 73–105. Springer, 2011. [Lops et al., 2013] Pasquale Lops, Marco de Gemmis, Giovanni Semeraro, Cataldo Musto, and Fedelucio Narducci. Contentbased and collaborative techniques for tag recommendation: an empirical evaluation. J. Intell. Inf. Syst., 40(1):41–61, 2013. [Marinho and Schmidt-Thieme, 2007] Leandro Balby Marinho and Lars Schmidt-Thieme. Collaborative tag recommendations. In GFKL, pages 533–540, 2007. [Rendle and Schmidt-Thieme, 2010] Steffen Rendle and Lars Schmidt-Thieme. Pairwise interaction tensor factorization for personalized tag recommendation. In WSDM, pages 81–90, 2010. [Sevil et al., 2010] Sare Gul Sevil, Onur Kucuktunc, Pinar Duygulu, and Fazli Can. Automatic tag expansion using visual similarity for photo sharing websites. Multimedia Tools Appl., 49(1):81– 99, 2010. [Shen and Fan, 2010] Yi Shen and Jianping Fan. Leveraging loosely-tagged images and inter-object correlations for tag recommendation. In ACM Multimedia, pages 5–14, 2010. [Shewchuk, 1994] Jonathan R Shewchuk. An introduction to the conjugate gradient method without the agonizing pain. Technical report, Carnegie Mellon University, Pittsburgh, PA, USA, 1994. [Sigurbjornsson and van Zwol, 2008 ¨ ] Borkur Sigurbj ¨ ornsson and ¨ Roelof van Zwol. Flickr tag recommendation based on collective knowledge. In WWW, pages 327–336, 2008. [Toderici et al., 2010] George Toderici, Hrishikesh Aradhye, Marius Pasca, Luciano Sbaiz, and Jay Yagnik. Finding meaning on youtube: Tag recommendation and category discovery. In CVPR, pages 3447–3454, 2010. [Wang and Blei, 2011] Chong Wang and David M. Blei. Collaborative topic modeling for recommending scientific articles. In KDD, pages 448–456, 2011. [Wang and Li, 2013] Hao Wang and Wu-Jun Li. Online egocentric models for citation networks. In IJCAI, 2013. [Wang et al., 2012] Meng Wang, Bingbing Ni, Xian-Sheng Hua, and Tat-Seng Chua. Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Comput. Surv., 44(4):25, 2012. [Weinberger et al., 2008] Kilian Q. Weinberger, Malcolm Slaney, and Roelof van Zwol. Resolving tag ambiguity. In ACM Multimedia, pages 111–120, 2008. [Wu et al., 2009] Lei Wu, Linjun Yang, Nenghai Yu, and XianSheng Hua. Learning to tag. In WWW, pages 361–370, 2009. [Xu et al., 2006] Z. Xu, Y. Fu, J. Mao, and D. Su. Towards the semantic web: Collaborative tag suggestions. In Proceedings of the Collaborative Web Tagging Workshop at the WWW 2006, Edinburgh, Scotland, 2006. [Zhen et al., 2009] Yi Zhen, Wu-Jun Li, and Dit-Yan Yeung. TagiCoFi: tag informed collaborative filtering. In RecSys, pages 69– 76, 2009