2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology Collaborative Filtering Recommender Systems Using Tag Information Huizhi Liang, Yue Xu, Yuefeng Li, Richi Nayak Queensland University of Technology, Brisbane, australia oklianghuizi@gmail.com,lyue.xu,y2.li,r:nayak@qut.edu.au Abstract voting on the tagged information resources or items [1] Thus, the tagging information can be used to make Recommender Systems is one of the effective tools to recommendations deal with information overload issue. Similar with the Currently some researches are focusing on how to explicit rating and other implicit rating behaviours such use collaborative tagging information to recommend Is purchase behaviour, click streams, and browsing personalized tags to users [2], but not much work has history etc, the tagging information implies users been done on utilizing tagging information to help users important personal interests and preferences to find interested items easily and quickly. information, which can be used to recommend In this paper, we will discuss how to recommen personalized items to users. This paper is to explore items to users based on tag information. how to utilise tagging information to do personalized recommendations. Based on the distinctive three 2. Related work a new user profiling and similarity measure method is Collaborative filtering is a traditional and wildly used proposed. The experiments suggest that the proposed approach to recommend items to users based on the approach is better than the traditional collaborative assumption that similar minded people may have filtering recommender systems using only rating data. similar taste or behaviors. In general, there are two kinds of collaborative filtering methods: user-based and 1. Introduction item-based. Though there is a lot of work on the collaborative filtering recommender systems, to the best Recommender systems can provide personalized of our knowledge, only Tso-Sutter's [3] work discussed contents, services and information items to potential bout using the tag information to doitem consumers to decrease information retrieval time and recommendation 0-Sutter's work, nformation was explicit rating is not always available, the implicit converted into two 2-dimensional relationships, user-tag rating such as purchase history, downloading behaviour and tag-item, and was used as a supplementary source and click patterns etc. become another important to extend the rating data. Because it ignored the three information source for recommender systems dimensional relationship among users, items, and tags With the development of web 2.0, collaborative the users' tagging behavior was not accurately profiled, tagging information becomes popular. Besides helping and thus the recommendation quality based on the ser organize his or her personal collections, a tag also extended data is still not satisfactory can be regarded as a user's personal opinion expression while tagging can be considered as implicit rating or 3. Tag-based Recommender systems 978-0-7695-3496-1/08525.00◎2008IEEE DOI 101109/WIIAT200897
Collaborative Filtering Recommender Systems Using Tag Information Huizhi Liang, Yue Xu, Yuefeng Li, Richi Nayak Faculty of Information Technology Queensland University of Technology, Brisbane, Australia oklianghuizi@gmail.com, {yue.xu, y2.li, r.nayak}@qut.edu.au Abstract Recommender Systems is one of the effective tools to deal with information overload issue. Similar with the explicit rating and other implicit rating behaviours such as purchase behaviour, click streams, and browsing history etc., the tagging information implies user’s important personal interests and preferences information, which can be used to recommend personalized items to users. This paper is to explore how to utilize tagging information to do personalized recommendations. Based on the distinctive three dimensional relationships among users, tags and items, a new user profiling and similarity measure method is proposed. The experiments suggest that the proposed approach is better than the traditional collaborative filtering recommender systems using only rating data. 1. Introduction Recommender systems can provide personalized contents, services and information items to potential consumers to decrease information retrieval time and support decision making process. Because user’s explicit rating is not always available, the implicit rating such as purchase history, downloading behaviour and click patterns etc. become another important information source for recommender systems. With the development of web 2.0, collaborative tagging information becomes popular. Besides helping user organize his or her personal collections, a tag also can be regarded as a user’s personal opinion expression, while tagging can be considered as implicit rating or voting on the tagged information resources or items [1]. Thus, the tagging information can be used to make recommendations. Currently some researches are focusing on how to use collaborative tagging information to recommend personalized tags to users [2], but not much work has been done on utilizing tagging information to help users to find interested items easily and quickly. In this paper, we will discuss how to recommend items to users based on tag information. 2. Related work Collaborative filtering is a traditional and wildly used approach to recommend items to users based on the assumption that similar minded people may have similar taste or behaviors. In general, there are two kinds of collaborative filtering methods: user-based and item-based. Though there is a lot of work on the collaborative filtering recommender systems, to the best of our knowledge, only Tso-Sutter’s [3] work discussed about using the tag information to do item recommendation. In Tso-Sutter’s work, the tag information was converted into two 2-dimensional relationships, user-tag and tag-item, and was used as a supplementary source to extend the rating data. Because it ignored the three dimensional relationship among users, items, and tags, the users’ tagging behavior was not accurately profiled, and thus the recommendation quality based on the extended data is still not satisfactory. 3. Tag-based Recommender systems 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology 978-0-7695-3496-1/08 $25.00 © 2008 IEEE DOI 10.1109/WIIAT.2008.97 59 Authorized licensed use limited to: PORTLAND STATE UNIVERSITY. Downloaded on July 31, 2009 at 18:37 from IEEE Xplore. Restrictions apply
3.1 User profiling like-minded peers for a target user. Based on user User profiling is to model users' features or profiles, the similarity of users can be calculated preferences. The approaches of profiling users with through various proximity measures such as Pearson user-item rating matrix and keywords vectors are correlation and cosine similarity. In Tso-Sutter's work, widely used in recommender systems. However, these the overlap of tags shared by users was used to measure approaches are used for describing two-dimensional the similarity [3 just like the traditional collaborative relationships between users and items. Though filtering( CF) using the overlap of commonly rated Tso-Sutter's approach takes the relationship between items. Tso-Sutter's method actually is the traditional CF tags and items into consideration [3], it ignores the with an extended dataset treating tags as additional relationship between tags and items for each user. The items. The improvement is very limit, because the user should be profiled not only by the tags and items, users neighborhood may be incorrectly formed if only but also the relationship between the tags and items of treating users tagging as implicit rating and ignoring to he user measure the similarities of the relationships of tags and To profile users tagging behavior accurately, we items. We propose to measure the similarity of two propose to model a user in a collaborative tagging users from the following three aspects community in three aspects, i.e., the tags used by the (1)UTsim(ui, ui): The similarity of users' tags, which user, the items tagged by the user, and the relationship measured by the percentage of common tags used between the tags and the tagged items. For easy by the two users approach, we give TSim(ui4u∩ following definitions first max U=uI, u2.un): Set of users in the collaborative (2) UPsim(ui, u): the similarity of user's items, which is measured by the percentage of common items P=(pl, p2.Pmi: Set of items that have been tagged tagged by the two users: Pu∩P PSim(ui, Uj) T=t, t2.,t: Set of tags that have been used by max( Pu l E(u t, Px=(0, 1): a function that specifies whether user (3)UTPsim(u, u ) the similarity of the users'tag-item u; has used the tag t to tag item pk relationship, which is measured by the percentage The user profile is defined as follows of common relations shared by the two users For a user u;, i-l n, let Tu, be the tag set of u TP∩TPl UTPsim(u;, u;F Tu={t∈T,彐p∈P,F(un4P)=1},TucT,Pube max TPk the item set of u,Pu={pkk∈T,彐t∈P,(u,tpk) =1), Pu,CP, TP, be the relationship between u,'s tag Thus, the overall similarity measure of two users is and item set, TP=(l tET, pEP, and defined as below E(ui, t;, px=1), UF=(Tu, Pui, TPi)is defined as the Simu(ui, uF-WuT*UTsim(u;, u)+WUP"UPsim(u;, u) user profile of user ui. The user profile or user model of +WUTPUTPsim(u, u)(4) all users is denoted as UF, UF=(UFiln j Where WUT WUP+ WUTp=l, wuT, Wup and WUTp are 3.2 Neighborhood formation Similarly, the similarity between two items is defined
3.1 User profiling User profiling is to model users’ features or preferences. The approaches of profiling users with user-item rating matrix and keywords vectors are widely used in recommender systems. However, these approaches are used for describing two-dimensional relationships between users and items. Though Tso-Sutter’s approach takes the relationship between tags and items into consideration [3], it ignores the relationship between tags and items for each user. The user should be profiled not only by the tags and items, but also the relationship between the tags and items of the user. To profile user’s tagging behavior accurately, we propose to model a user in a collaborative tagging community in three aspects, i.e., the tags used by the user, the items tagged by the user, and the relationship between the tags and the tagged items. For easy describing the proposed approach, we give the following definitions first: U= {u1, u2…un}: Set of users in the collaborative tagging community. P= {p1, p2… pm}: Set of items that have been tagged by users. T= {t1, t2…, tl}: Set of tags that have been used by users. E(ui,tj,pk)={0,1}: a function that specifies whether user ui has used the tag tj to tag item pk The user profile is defined as follows: For a user ui, i=1..n, let Tui be the tag set of ui, Tui={tj|tj∈T, ∃pk∈P, E(ui,tj, pk) =1}, Tui ⊆ T, Pui be the item set of ui, Pui={pk|pk∈T, ∃tj∈P, E(ui,tj, pk) =1}, Pui⊆ P, TPi be the relationship between ui’s tag and item set, TPi={| tj∈T, pk∈P, and E(ui,tj,pk)=1} , UFi = (Tui, Pui, TPi) is defined as the user profile of user ui. The user profile or user model of all users is denoted as UF, UF={UFi|i=1..n }. 3.2 Neighborhood Formation Neighbourhood formation is to generate a set of like-minded peers for a target user. Based on user profiles, the similarity of users can be calculated through various proximity measures such as Pearson correlation and cosine similarity. In Tso-Sutter’s work, the overlap of tags shared by users was used to measure the similarity [3] just like the traditional collaborative filtering (CF) using the overlap of commonly rated items. Tso-Sutter’s method actually is the traditional CF with an extended dataset treating tags as additional items. The improvement is very limit, because the user’s neighborhood may be incorrectly formed if only treating users’ tagging as implicit rating and ignoring to measure the similarities of the relationships of tags and items. We propose to measure the similarity of two users from the following three aspects: (1) UTsim(ui, uj): The similarity of users’ tags, which is measured by the percentage of common tags used by the two users: (2) UPsim(ui, uj): the similarity of user’s items, which is measured by the percentage of common items tagged by the two users: (3) UTPsim(ui, uj): the similarity of the users’ tag-item relationship, which is measured by the percentage of common relations shared by the two users: Thus, the overall similarity measure of two users is defined as below: Simu(ui, uj)=wUT*UTsim(ui, uj)+wUP*UPsim(ui, uj ) +wUTP*UTPsim(ui , uj ) (4) Where wUT + wUP+ wUTP=1, wUT, wUP and wUTP are the weights to the three similarity measures, respectively. Similarly, the similarity between two items is defined UTsim(ui,uj)= |Tui∩Tuj| max{|Tuk|} uk ∈U (1) UPsim(ui,uj)= |Pui∩Puj| max{|Puk|} uk ∈U (2) UTPsim(ui,uj)= |TPi∩TPj| max {|TPk|} uk ∈U (3) 60 Authorized licensed use limited to: PORTLAND STATE UNIVERSITY. Downloaded on July 31, 2009 at 18:37 from IEEE Xplore. Restrictions apply
ding to the predic Simp(pi, Pi FWpu'PUsim(p, Pi/+WPT"PTsim(p, Pi) top N items will be recommended to u ∑simu(u,u)*R(uyP Where WpU, WPT, WpuT are the weights. Their sum is 1 ∈Cmuy A(u; Pk)= and PUsim(pi, pi), PTsim(pi, pi), PUTsim(pi, pi)are defined as follows For the item based approach, the prediction score (1)PTsim(pi, pi ): The similarity of two items based on calculated by formula(10) using the item similarities the percentage of being put in the same tag Tp∩Tpl AP(u1P)=∑mp(Pkp) (10) PTsim(pi, pi- p∈Pu maxTp Where Tpx is the tag set of item px. Tp"(tI t ET, E(pk, 4. Experiments (2)PUsim(Pi, p): the similarity of two items based or We have conducted experiments to evaluate the the percentage of being tagged by the same user methods proposed in Section 3. The dataset for the JUp nUr experiments is obtained from Amazon. com. Because the PUsim(p, p;= nall P items of the Amazon tagging community are mainly books the book items are collected. To avoid severe Where Upk is the user set of item pk. Up=(ul u, EU, sparsity problem, we selected those users who tagged 引t∈T,E(utpk)=1 least 5 items, tags that are used by at least 5 users, and 3)PUTsim(p i, p): the similarity of the two items items that are tagged at least 5 times. The final dataset based on the percentage of common tag-item comprises 3179 users, 8083 tags and 11942 books The whole dataset is split into a test dataset and a PUTsim(p, p- UPinUPl(8) training dataset and the split percentage is 50% each The top N items will be recommended to the user. The Where UP is the user and item set of tag t. UP=(|u∈Upk∈P,andE(u,t,pk)=1} recommendations To evaluate the effectiveness of the proposed 3.3 Recommendation generation approach(Tag-based CF), we compared the precision and recall of the recommended top 5 items of the We propose two methods to make item proposed approach with the performance of the standard recommendations to the target user u, namely, a user collaborative filtering(Traditional CF) approaches that based approach and an item based approach, based only use user ratings and also compared with on the neighbour users' item lists or the similarity of Tso-Sutter's approach(Tag-aware method) that extends items, respectively the user rating matrix with the tag information. In fact, et C(ui)be the neighbourhood of u. For the user the proposed approach covers the two approaches when based approach, the candidate items for u, are taken some of the similarity measure weights are set to zero from the items tagged by the users in C(ui). For each The comparison of precision and recall of user-based candidate item pk, based on the similarity between ui approaches is illustrated in Figure 1, while item-based and its neighbour users, and the neighbour users' comparison is shown in Figure 2 implicit ratings to pk denoted as R(yj, px), a prediction score denoted as A(ui, px)is calculated using Equation 5. Discussion
as formula (5). Simp(pi,pj)=wPU*PUsim(pi, pj)+wPT*PTsim(pi, pj) +wPUT*PUTsim(pi, pj) (5) Where wPU, wPT, wPUT are the weights. Their sum is 1, and PUsim(pi, pj), PTsim(pi, pj), PUTsim(pi, pj) are defined as follows: (1) PTsim(pi, pj): The similarity of two items based on the percentage of being put in the same tag. Where Tpk is the tag set of item pk. Tpk= {ti| tj∈T, E(pk, tj)=1}. (2) PUsim(pi, pj): the similarity of two items based on the percentage of being tagged by the same user. Where Upk is the user set of item pk. Upk= {ui| ui∈U, ∃tj∈T, E(ui,tj, pk) =1}. (3) PUTsim(pi, pj): the similarity of the two items based on the percentage of common tag-item relationship. Where UPj is the user and item set of tag tj. UPj= {| ui∈U,pk∈P, and E(ui,tj,pk)=1}. 3.3 Recommendation Generation We propose two methods to make item recommendations to the target user ui, namely, a user based approach and an item based approach, based on the neighbour users’ item lists or the similarity of items, respectively. Let C(ui) be the neighbourhood of ui. For the user based approach, the candidate items for ui are taken from the items tagged by the users in C(ui). For each candidate item pk, based on the similarity between ui and its neighbour users, and the neighbour users’ implicit ratings to pk denoted as R(uj, pk), a prediction score denoted as Au (ui,pk) is calculated using Equation (9) given below. According to the prediction scores, the top N items will be recommended to ui. . For the item based approach, the prediction score is calculated by formula (10) using the item similarities. 4. Experiments We have conducted experiments to evaluate the methods proposed in Section 3. The dataset for the experiments is obtained from Amazon.com. Because the items of the Amazon tagging community are mainly books, the book items are collected. To avoid severe sparsity problem, we selected those users who tagged at least 5 items, tags that are used by at least 5 users, and items that are tagged at least 5 times. The final dataset comprises 3179 users, 8083 tags and 11942 books. The whole dataset is split into a test dataset and a training dataset and the split percentage is 50% each. The top N items will be recommended to the user. The precision and recall are used to evaluate the accuracy of recommendations. To evaluate the effectiveness of the proposed approach (Tag-based CF), we compared the precision and recall of the recommended top 5 items of the proposed approach with the performance of the standard collaborative filtering (Traditional CF) approaches that only use user ratings and also compared with Tso-Sutter’s approach (Tag-aware method) that extends the user rating matrix with the tag information. In fact, the proposed approach covers the two approaches when some of the similarity measure weights are set to zero. The comparison of precision and recall of user-based approaches is illustrated in Figure 1, while item-based comparison is shown in Figure 2. 5. Discussion PTsim(pi, pj)= |Tpi∩Tpj| max|Tpk| pk ∈P (6) PUsim(pi,pj)= |Upi∩Upj| max|Upk| pk ∈P (7) PUTsim(pi,pj)= |UPi∩UPj| max{|UPk|} pk ∈P (8) Ap (ui,pk)= ∑simp(pk,pj) pj∈Pui, (10) Au (ui,pk)= ∑simu(ui,uj)*R(uj, pk) uj∈C(ui) |C(ui)| (9) 61 Authorized licensed use limited to: PORTLAND STATE UNIVERSITY. Downloaded on July 31, 2009 at 18:37 from IEEE Xplore. Restrictions apply
results suggest that the recommendation accuracy is The experimental results in Figure I and Figure 2 more improved by profiling users with tag, item and the show that the precision and recall of the proposed relationship between tag and item than profiling users approach are better than the traditional CF and by extending implicit rating with tag information Tso-Sutter's approach for both user-based and Furthermore, the results also suggest that it is better to item-based models calculate the similarity based on the overall similarity of Though Tso-Sutter claimed that the tag information tagging behaviour than just measuring it as implicit can only be useful after fusing the user and item rating similarity collaborative filtering and it will be seen as noise for standard user-based and item-based Cf alone our 6. Conclusion experimental results show that tag information can be used to improve the standard user-based and item-based This paper discusses how ommend items to collaborative filtering users based on collaborative tagging information. Instead of treating tagging behaviour as just implicit User-based approach rating behaviour, the proposed tag based collaborative 0.12 filtering approach uses the three dimensional Precision relationship of tagging behaviour to profile users and generate most likely minded neighbours or similar items. The experiments show promising results of Traditional Tag-aware Tag-based approach to recommend personalized items. The experimental results also indicate that the tag information can be used to improve the standard Figure 1. Comparison of user based approaches user-based and item-based collaborative filtering Item-based approach References 0.12 [1 H. Halpin, V Robu, and H. Shepherd. The Complex O Precisi CO 日 Recall dings of r international conference on World Wide Web, ACM, USA, 2007,pp.211-220. Traditional Tag-aware Tag-based [2] P. Heymann, D. Ramage, and H. Garcia-Molina. " Social ACM SIGIR conference on Research and development in 3]KH L. Tso-Sutter, L B. Marinho, and L. Schmidt-Thieme Besides, the experimental results also show that the " Tag-aware Recommender Systems by Fusion of traditional collaborative filtering recommendation based Collaborative Filtering Algorithms", Proceedings of the 2008 on the similarity of rating behaviour doesnt work well ACM symposium on Applied computing, ACM, USA, 2008, to process the collaborative tagging information. The pp 1995-1999
The experimental results in Figure 1 and Figure 2 show that the precision and recall of the proposed approach are better than the traditional CF and Tso-Sutter’s approach for both user-based and item-based models Though Tso-Sutter claimed that the tag information can only be useful after fusing the user and item collaborative filtering and it will be seen as noise for standard user-based and item-based CF alone, our experimental results show that tag information can be used to improve the standard user-based and item-based collaborative filtering. User-based approach 0 0.02 0.04 0.06 0.08 0.1 0.12 Traditional CF Tag-aware method Tag-based CF Top 5 Precision Recall Item-based approach 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Traditional CF Tag-aware method Tag-based CF Top 5 Precision Recall Besides, the experimental results also show that the traditional collaborative filtering recommendation based on the similarity of rating behaviour doesn’t work well to process the collaborative tagging information. The results suggest that the recommendation accuracy is more improved by profiling users with tag, item and the relationship between tag and item than profiling users by extending implicit rating with tag information. Furthermore, the results also suggest that it is better to calculate the similarity based on the overall similarity of tagging behaviour than just measuring it as implicit rating similarity. 6. Conclusion This paper discusses how to recommend items to users based on collaborative tagging information. Instead of treating tagging behaviour as just implicit rating behaviour, the proposed tag based collaborative filtering approach uses the three dimensional relationship of tagging behaviour to profile users and generate most likely minded neighbours or similar items. The experiments show promising results of employing the tag based collaborative filtering approach to recommend personalized items. The experimental results also indicate that the tag information can be used to improve the standard user-based and item-based collaborative filtering approaches. References: [1] H. Halpin, V.Robu, and H. Shepherd. “The Complex Dynamics of Collaborative Tagging”, Proceedings of the 16th international conference on World Wide Web, ACM, USA, 2007, pp. 211 – 220. [2] P. Heymann, D. Ramage, and H. Garcia-Molina. ”Social tag prediction”, Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, USA, 2008, pp. 531-538. [3] K.H.L. Tso-Sutter, L.B. Marinho, and L.Schmidt-Thieme. “Tag-aware Recommender Systems by Fusion of Collaborative Filtering Algorithms”, Proceedings of the 2008 ACM symposium on Applied computing, ACM, USA, 2008, pp. 1995-1999. Figure 1. Comparison of user based approaches Figure 2. Comparison of item-based approaches 62 Authorized licensed use limited to: PORTLAND STATE UNIVERSITY. Downloaded on July 31, 2009 at 18:37 from IEEE Xplore. Restrictions apply