Tag Based Collaborative Filtering for Recommender Systems Huizhi Liang, Y ue Xu, Yuefeng Li, and Richi Nayak School of Information Technology, Queensland University of Technology, Brisbane. australia oklianghuizi@gmail.com,(yue.xu,y2.li,rnayak)equteduau Abstract Collaborative tagging can help users organize, share and retrieve in- formation in an easy and quick way. For the collaborative tagging information ommend personalized items to users. This paper proposes a novel tag-based collaborative filtering approach for recommending personalized items to users of online communities that are equipped with tagging facilities. Based on the distinctive three dimensional relationships among users, tags and items, a new similarity measure method is proposed to generate the neighborhood of users with similar tagging behavior instead of similar implicit ratings. The promising experiment result shows that by using the tagging information the proposed ap- proach outperforms the standard user and item based collaborative filtering Keywords: Collaborative filtering, collaborative tagging, recommender systems, user profiling 1 Introduction Nowadays collaborative tagging or social annotation is becoming popular in online web sites or online communities. Harnessing the collaborative work of thousands or millions of web users to add natural language keywords to information resources, it becomes easy to retrieve, organize and share information quickly and efficiently. For its simplicity and effectiveness, collaborative tagging has been used in various web application areas, such as social bookmark site del.ici ous, photo sharing website Flickr. com, academic paper database system CiteULike, and electronic commerce websiteAmazon.com,etc Besides helping user organizing his or her personal collections, a tag also can be regarded as a user's personal opinion expression while tagging can be considered as implicit rating or voting on the tagged information resources or items [1]. Thus, the tagging information implies users important personal interest and preference infor mation, which can be used to greatly improve personalized searching [2] and recom- mendation making. Currently some works have been done on how to use collaborative tagging infor mation to recommend personalized tags to users [3], but not much work done on uti lizing tagging information to help users to find interested items easily and quickly P Wen et al.(Eds ) RSKT 2009, LNCS 5589, pp. 666-673, 2009. o Springer-Verlag Berlin Heidelberg 2009
P. Wen et al. (Eds.): RSKT 2009, LNCS 5589, pp. 666–673, 2009. © Springer-Verlag Berlin Heidelberg 2009 Tag Based Collaborative Filtering for Recommender Systems Huizhi Liang, Yue Xu, Yuefeng Li, and Richi Nayak School of Information Technology, Queensland University of Technology, Brisbane, Australia oklianghuizi@gmail.com, {yue.xu,y2.li,r.nayak}@qut.edu.au Abstract. Collaborative tagging can help users organize, share and retrieve information in an easy and quick way. For the collaborative tagging information implies user’s important personal preference information, it can be used to recommend personalized items to users. This paper proposes a novel tag-based collaborative filtering approach for recommending personalized items to users of online communities that are equipped with tagging facilities. Based on the distinctive three dimensional relationships among users, tags and items, a new similarity measure method is proposed to generate the neighborhood of users with similar tagging behavior instead of similar implicit ratings. The promising experiment result shows that by using the tagging information the proposed approach outperforms the standard user and item based collaborative filtering approaches. Keywords: Collaborative filtering, collaborative tagging, recommender systems, user profiling. 1 Introduction Nowadays collaborative tagging or social annotation is becoming popular in online web sites or online communities. Harnessing the collaborative work of thousands or millions of web users to add natural language keywords to information resources, it becomes easy to retrieve, organize and share information quickly and efficiently. For its simplicity and effectiveness, collaborative tagging has been used in various web application areas, such as social bookmark site del.ici.ous, photo sharing website Flickr.com, academic paper database system CiteULike, and electronic commerce website Amazon.com, etc. Besides helping user organizing his or her personal collections, a tag also can be regarded as a user’s personal opinion expression while tagging can be considered as implicit rating or voting on the tagged information resources or items [1]. Thus, the tagging information implies user’s important personal interest and preference information, which can be used to greatly improve personalized searching [2] and recommendation making. Currently some works have been done on how to use collaborative tagging information to recommend personalized tags to users [3], but not much work done on utilizing tagging information to help users to find interested items easily and quickly
Tag Based Collaborative Filtering for Recommender Systems 667 Thus, how to recommend personalized items to users based on tagging information becomes an important research question and the research is just on the start. e. In this paper, we will propose a tag-based collaborative filtering approach that can ake personalized recommendations based on user tagging behavior. The paper is organized as below In section 2, the related work will be discussed. Then, we will discuss the proposed tag-based collaborative filtering approach in details in section 3. In this section, the user profiling approach, the distinctive three-dimensional relationship among users, items and tags, the similarity measure method and the user-based and item-based approaches of generating top n recommended item list will be discussed. The experiments will be illustrated in section 4 while the discussion about the experiment results will be on sec tion 5. Finally, in section 6, we will give a conclusion about this work. 2 Related Work Collaborative tagging is a typical web 2.0 application that contains plenty of user interaction information. Collaborative tagging information can be used to build virtual social network, find interest group as well as organize, share, gather and discover information resources. As collaborative tagging information is a kind of emergent online community information, the discussion of tagging behavior itself and its usage patterns and applications still remain open [4] Collaborative filtering is a traditional and wildly used approach to recommend items to users, which based on the assumption that similar minded people will have similar taste or behaviors. Although there is a lot of work on the collaborative filtering recommender systems, only Tso-Sutter's [5] work discussed about using the tag in- formation to do item recommendation to the best of our knowledge In Tso-Sutter's work, the three-dimensional relationship among user, item and tag was converted into three two dimensional relationships user-item, user-tag and tag item. Thus, the tag information was used as an extension of user-item implicit rating matrix and the tagging behavior was profiled and measured as implicit rating beha- vior. Because it ignored some distinct features of tagging behavior, the work failed to use tag information to do item recommendation accurately. 3 Tag Based Collaborative Filtering 3.1 User profiling User profiling is to model users' features or preferences. The approaches of profiling users with user-item rating matrix and keywords vectors are widely used in recom- mender systems. To profile user's tagging behavior correctly and accurately, we pro- pose to model a user in a collaborative tagging community in three aspects, i. e, the tags used by the user, the items tagged by the user, and the relationship between the tags and the tagged items. For easy describing the proposed approach, we give the following definitions: U: Set of users. U=(ur, u2. un), it contains all the users of the collaborative tag ging community
Tag Based Collaborative Filtering for Recommender Systems 667 Thus, how to recommend personalized items to users based on tagging information becomes an important research question and the research is just on the start. In this paper, we will propose a tag-based collaborative filtering approach that can make personalized recommendations based on user tagging behavior. The paper is organized as below: In section 2, the related work will be discussed. Then, we will discuss the proposed tag-based collaborative filtering approach in details in section 3. In this section, the user profiling approach, the distinctive three- dimensional relationship among users, items and tags, the similarity measure method and the user-based and item-based approaches of generating top N recommended item list will be discussed. The experiments will be illustrated in section 4 while the discussion about the experiment results will be on section 5. Finally, in section 6, we will give a conclusion about this work. 2 Related Work Collaborative tagging is a typical web 2.0 application that contains plenty of user interaction information. Collaborative tagging information can be used to build virtual social network, find interest group as well as organize, share, gather and discover information resources. As collaborative tagging information is a kind of emergent online community information, the discussion of tagging behavior itself and its usage patterns and applications still remain open [4]. Collaborative filtering is a traditional and wildly used approach to recommend items to users, which based on the assumption that similar minded people will have similar taste or behaviors. Although there is a lot of work on the collaborative filtering recommender systems, only Tso-Sutter’s [5] work discussed about using the tag information to do item recommendation to the best of our knowledge. In Tso-Sutter’s work, the three-dimensional relationship among user, item and tag was converted into three two dimensional relationships user-item, user-tag and tagitem. Thus, the tag information was used as an extension of user-item implicit rating matrix and the tagging behavior was profiled and measured as implicit rating behavior. Because it ignored some distinct features of tagging behavior, the work failed to use tag information to do item recommendation accurately. 3 Tag Based Collaborative Filtering 3.1 User Profiling User profiling is to model users’ features or preferences. The approaches of profiling users with user-item rating matrix and keywords vectors are widely used in recommender systems. To profile user’s tagging behavior correctly and accurately, we propose to model a user in a collaborative tagging community in three aspects, i.e., the tags used by the user, the items tagged by the user, and the relationship between the tags and the tagged items. For easy describing the proposed approach, we give the following definitions: U: Set of users. U= {u1, u2…un}, it contains all the users of the collaborative tagging community
668 H. Liang et al. P: Set of items. P=(P1, P2..Pml, it contains all tagged items. An item is an object that is tagged by users and it can be any kind of objects in the application areas, such as books, movies, URLS, photos, and academic papers etc. T: Set of tags. T=t t2., 4, it includes all the tags that have been used by users A tag is a relevant keyword assigned to one or more items by a user, describing the items and enabling classification of the items E(u; ;Po: a function that specifies user u; used the tag t; tagging item pk The user profile is defined as below: Definition [User Profile]: For a user u i=l.n, let Tu, be the tag set of u Tu=I t t;ET, EPEP, E(u; 4. Pv=1 ,Tu ST, Pu; be the item set of u; Pu; =Ip p ET,=tEP, Equ;, 5 Px)=l, Pui CP, TPi be the relationship between u;'s tag and item set, TP=( tET,PEP, and E(u; P=1), UF=(Tu; Pui, TPi)is defined as the user profile of user u;. The user profile or user model of all users is denoted as UF, UF=(UF i=l.n 3.2 The Multiple relationships From the above user profile, we can see the relationship describing the situation of an item p being tagged with tag t by user u; is three-dimensional, which is very different with the two-dimensional explicit rating behavior or other implicit rating behaviors that only involve users and items. Based on it, other three-dimensional and two- dimensional relationships can be derived. These multiple relationships are vital for collaborative filtering approaches especially for the neighborhood forming To facilitate understanding, we discuss the multiple relationships among users, tags and items from the perspectives of user, item and tag respectively as follows From the perspective of users, the relationship among users, tags and items is denoted as Ru TP, which is the direct and basic three-dimensional relationship and describes the tagging behavior of each user. Ru Tp=(<ui,TPilu; EU, i=l n, where TPi is the relationship between u;'s tag and item set, as defined in section 3. 1. Based on it, other two two-dimensional relationships Ru, P and Ru T can be derived, which are defined as below Ru. P: The relationship between users and their item sets. This two dimensional rela- tionship can be used as the base of traditional user-based collaborative filtering ap- proach. Ru, p=(<u, Puplu EU, Pui CP, i=ln, Pui is item set of u;, as defined in section 3.1 RU. T: The relationship between users and their tag sets. Ru,t=<u,TuluEU, T CT, i=ln, Tu is item set of ui, as defined in section 3 From the perspective of items, the relationship among tags and items is different, which is defined as Rp ur Rp UT=(<pk, UTRlpKEP, k=l.m). UTk is the user and tag set of item Pk-=(<u l u; EU,L ET, and E(u;, 4. Px=1). Similarly,other two dimensional relationships Rp u and RPT can be derived, which are defined as
668 H. Liang et al. P: Set of items. P= {p1, p2… pm}, it contains all tagged items. An item is an object that is tagged by users and it can be any kind of objects in the application areas, such as books, movies, URLs, photos, and academic papers etc. T: Set of tags. T= {t1, t2…, tl}, it includes all the tags that have been used by users. A tag is a relevant keyword assigned to one or more items by a user, describing the items and enabling classification of the items. E(ui,tj,pk): a function that specifies user ui used the tag tj tagging item pk The user profile is defined as below: Definition [User Profile]: For a user ui, i=1..n, let Tui be the tag set of ui, Tui={tj|tj∈T, ∃pk∈P, E(ui,tj, pk) =1}, Tui T, Pui be the item set of ui, Pui={pk|pk∈T, ∃tj∈P, E(ui,tj, pk) =1}, Pui P, TPi be the relationship between ui’s tag and item set, TPi={| tj∈T, pk∈P, and E(ui,tj,pk)=1} , UFi = (Tui, Pui, TPi) is defined as the user profile of user ui. The user profile or user model of all users is denoted as UF, UF={UFi|i=1..n }. 3.2 The Multiple Relationships From the above user profile, we can see the relationship describing the situation of an item pk being tagged with tag tj by user ui is three-dimensional, which is very different with the two-dimensional explicit rating behavior or other implicit rating behaviors that only involve users and items. Based on it, other three-dimensional and twodimensional relationships can be derived. These multiple relationships are vital for collaborative filtering approaches especially for the neighborhood forming. To facilitate understanding, we discuss the multiple relationships among users, tags and items from the perspectives of user, item and tag respectively as follows: z From the perspective of users, the relationship among users, tags and items is denoted as RU, TP,which is the direct and basic three-dimensional relationship and describes the tagging behavior of each user. RU, TP={|ui∈U, i=1..n}, where TPi is the relationship between ui’s tag and item set, as defined in section 3.1. Based on it, other two two-dimensional relationships RU, P and RU, T can be derived, which are defined as below: RU, P: The relationship between users and their item sets. This two dimensional relationship can be used as the base of traditional user-based collaborative filtering approach. RU, P = {|ui∈U, Pui P, i=1..n}, Pui is item set of ui, as defined in section 3.1. RU, T: The relationship between users and their tag sets. RU, T = {|ui∈U, Tui T, i=1..n}, Tui is item set of ui, as defined in section 3.1. z From the perspective of items, the relationship among users, tags and items is different, which is defined as RP, UT. RP, UT= {|pk∈P, k=1..m}. UTk is the user and tag set of item pk. UTk= {| ui∈U,tj∈T, and E(ui,tj,pk)=1}. Similarly, other two dimensional relationships RP, U and RP, T can be derived, which are defined as below: ⊆ ⊆ ⊆ ⊆
Tag Based Collaborative Filtering for Recommender Systems 669 Rp U: The relationship between items and their user sets. Different from Ru. P that describing each user's item set, Re u is describing each items user set. The traditional item-based collaborative filtering approach is based on this relationship. Rp U=(l pEP, Tpk ≤T,k=1. m) Tpx is the tag set of item pk. Tp=团tteT,彐ueU,E(u,p)=1, p∈P,k=1.m From the perspective of tags, the relationship among users, tags and items is denoted as RT Up. Though it has not been used for the recommendation of items di- rectly, we still give its definition as below for the sake of helping user get a whole view of the relationships among users, tags and items. RT, UP=(luH EU, PKEP, and E(u; t- Px)=l]. The other derived two-dimensional relationships Rr. u and Rr. p are defined as below Rr, U: The relationship between tags and their user sets. Rr, U=( ET,Ut C U,j=lI]. Ut is the user set of tag t. Ut=uil u; EU, PkEP, E(u;s t; Px)=1, ET RT. P: The relationship between tags and their item sets. In this relationship, the tag collects all items that are being tagged with it by various users, which shows the result of this collaborative tagging work. RT P=It;, Pt>l tET,Pt,CP, j=l.] Pt; is the item set of tag t.P!={ppeP,彐u∈U,E(u,tp)=1 These multiple relationships can be used to recommend personalized items, virtual friends, and tags to users. But for the scope of this paper, we will only focus on how to do item recommendations in the following sections 3.3 Neighborhood Formation Neighborhood formation is to generate a set of like-minded peers for a target user. Forming neighborhood for a target user u EU with standard"best-n-neighbors""tech- nique involves computing the distances between u; and all other users and selecting the top N neighbors with shortest distances to ui Based on user profiles, the similarity of users can be calculated through various proximity measures. Pearson correlation and cosine similarity are widely used to calculate the similarity by using users'expli it rating data. However, explicit rating data is not al ways available. Unlike explicit ratings in which users are asked to supply their perceptions to items explicitly in a numeric scale, implicit ratings such as transaction histories, browsing histories, prod uct mentions. etc. are also obtainable for most e-commerce sites and communities. For online communities with the tagging facility, binary implicit ratings can be ob ained based on users'tagging information. If a user has tagged a product or item, the mplicit rating to this item by this user is set to l otherwise 0
Tag Based Collaborative Filtering for Recommender Systems 669 RP, U: The relationship between items and their user sets. Different from RU, P that describing each user’s item set, RP, U is describing each item’s user set. The traditional item-based collaborative filtering approach is based on this relationship. RP, U= {|pk∈P, Upk U, k=1..m}. Upk is the user set of item pk. Upk= {ui| ui∈U, ∃tj∈T, E(ui,tj, pk) =1}, pk∈P, k=1..m. RP, T: The relationship between items and their tag sets. RP, T = {| pk∈P,Tpk T, k=1..m} Tpk is the tag set of item pk. Tpk= {ti| tj∈T, ∃ui∈U, E(ui,tj, pk) =1}, pk∈P, k=1..m. z From the perspective of tags, the relationship among users, tags and items is denoted as RT, UP. Though it has not been used for the recommendation of items directly, we still give its definition as below for the sake of helping user get a whole view of the relationships among users, tags and items. RT, UP= {| tj∈T, j=1..l}. UPj is the user and item set of tag tj. UPj= {| ui∈U,pk∈P, and E(ui,tj,pk)=1}. The other derived two-dimensional relationships RT, U and RT, P are defined as below: RT, U: The relationship between tags and their user sets. RT, U= {|tj∈T, Utj U, j=1..l}. Utj is the user set of tag tj. Utj={ui| ui∈U, ∃pk∈P, E(ui,tj, pk) =1}, tj∈T, j=1..l. RT, P: The relationship between tags and their item sets. In this relationship, the tag collects all items that are being tagged with it by various users, which shows the result of this collaborative tagging work. RT, P = {| tj∈T, Ptj P, j=1..l} Ptj is the item set of tag tj. Ptj= {pk| pk∈P, ∃ui∈U, E(ui,tj, pk) =1}. These multiple relationships can be used to recommend personalized items, virtual friends, and tags to users. But for the scope of this paper, we will only focus on how to do item recommendations in the following sections. 3.3 Neighborhood Formation Neighborhood formation is to generate a set of like-minded peers for a target user. Forming neighborhood for a target user ui∈U with standard “best-n-neighbors” technique involves computing the distances between ui and all other users and selecting the top N neighbors with shortest distances to ui. Based on user profiles, the similarity of users can be calculated through various proximity measures. Pearson correlation and cosine similarity are widely used to calculate the similarity by using users’ explicit rating data. However, explicit rating data is not always available. Unlike explicit ratings in which users are asked to supply their perceptions to items explicitly in a numeric scale, implicit ratings such as transaction histories, browsing histories, product mentions, etc., are also obtainable for most e-commerce sites and communities. For online communities with the tagging facility, binary implicit ratings can be obtained based on users’ tagging information. If a user has tagged a product or item, the implicit rating to this item by this user is set to 1 otherwise 0. ⊆ ⊆ ⊆ ⊆
670 H. Liang et al. For the implicit binary rating data, a simple but effective way to compute milarity is to calculate the overlaps of two users rated items. The higher the overlap, the more similar the two users are Based on the user profiles, two users similarity is calculated. In Tso-Sutter's worl the user was only profiled with the tag and item set, the similarity measure method of implicit rating behavior was used to form neighborhood. That is, the overlap of tags and items was used to measure the similarity [5]. However, it is not correct to UFi=(Globalization), I The world is flat, The Long Tail,(, ))and UF=(outsource, globalization), The world is flat, How Soccer Explains the World , ()), the similarity measure should not only include the number of tags the users have used in common, the number of items the users have tagged in common. but also the number of using the same tag tagging the same item. If we just regard tagging behavior as implicit rating behavior, ignoring to measure the similarities of the relationships of tags and items, the wrong neighbors may be found. Only through lculating the similarity of tagging behaviors, the likely-minded users can be foun Thus, the similarity measure of two users includes the following three parts (1)UTsim(u, u): The similarity of users'tags, which is measured by the percen tage of common tags used by the two users UTsimqui,uj) TuinTu axukEUITukl (1) As defined in section 3.1, Tu, is the tag set of u; Tu=( lt, ET,=p EP, E(u; '; Px) (2)UPsim(ui, ui): the similarity of user's items, which is measured by the per- centage of common items tagged by the two users UPsim(uj,u) unPub As defined in section 3. 1, Pu; is the item set of u, Pu=IPPkET, t EP, E(u; 4. (3)UTPsim(ui, u): the similarity of the users'tag-item relationship, which is measured by the percentage of common relations shared by the two users: axukEUlITPuklF As defined in section 3.1, TPi is the relationship between uis tag and set,TP=(I t,ET,P,EP, and E(u;, ' Px=1j Thus, the similarity measure of two users is defined as below simu(uj,u)=WuT. UTsim(u, u)+Wup. UPsim(ui, ui)+WuTP UTPsim(uj, ui) where WUT WUP+ WUTp=l, WuT, WuP and WUTp are the weighs to the thre measures, respectively. The weighs can be adjusted for different dataset. We can see the similarity measure of users is based on ru that defined in section 3.2
670 H. Liang et al. For the implicit binary rating data, a simple but effective way to compute user similarity is to calculate the overlaps of two user’s rated items. The higher the overlap, the more similar the two users are. Based on the user profiles, two user’s similarity is calculated. In Tso-Sutter’s work, as the user was only profiled with the tag and item set, the similarity measure method of implicit rating behavior was used to form neighborhood. That is, the overlap of tags and items was used to measure the similarity [5]. However, it is not correct to measure the similarity of users’ tagging behaviors as the same way as implicit rating behaviors. For example, for two users ui and uj with profiles UFi= ( {globalization}, { The world is flat, The Long Tail}, {, } ) and UFj= ( {outsource, globalization}, {The world is flat, How Soccer Explains the World}, {, } ), the similarity measure should not only include the number of tags the users have used in common, the number of items the users have tagged in common, but also the number of using the same tag tagging the same item. If we just regard tagging behavior as implicit rating behavior, ignoring to measure the similarities of the relationships of tags and items, the wrong neighbors may be found. Only through calculating the similarity of tagging behaviors, the likely-minded users can be found. Thus, the similarity measure of two users includes the following three parts: (1) ܷܶݏ݉݅൫ݑ ,ݑ൯: The similarity of users’ tags, which is measured by the percentage of common tags used by the two users: ܷܶݏ݉݅൫ݑ ,ݑ൯ ൌ |்௨ת்௨ೕ| ୫ୟ୶ೠೖאೆሼ|்௨ೖ|ሽ (1) As defined in section 3.1, Tui is the tag set of ui, Tui={tj|tj∈T, ∃pk∈P, E(ui,tj, pk) =1} (2) ܷܲݏ݉݅൫ݑ ,ݑ൯: the similarity of user’s items, which is measured by the percentage of common items tagged by the two users: ܷܲݏ݉݅൫ݑ ,ݑ൯ ൌ |௨ת௨ೕ| ୫ୟ୶ೠೖאೆሼ|௨ೖ|ሽ (2) As defined in section 3.1, Pui is the item set of ui, Pui={pk|pk∈T, ∃tj∈P, E(ui,tj, pk) =1} (3) ܷܶܲݏ݉݅൫ݑ ,ݑ൯: the similarity of the users’ tag-item relationship, which is measured by the percentage of common relations shared by the two users: ܷܶܲݏ݉݅൫ݑ ,ݑ൯ ൌ |்௨ת்௨ೕ| ୫ୟ୶ೠೖאೆሼ|்௨ೖ|ሽ (3) As defined in section 3.1, TPi is the relationship between ui’s tag and item set,TPi={| tj∈T, pk∈P, and E(ui,tj,pk)=1} Thus, the similarity measure of two users is defined as below: · ்ݓ൯ݑ ,ݑ൫݅݉ݏܷܲ · ݓ൯ݑ ,ݑ൫݅݉ݏܷܶ · ்ݓ൯ൌݑ ,ݑ௨൫݅݉ݏ (4 (൯ݑ ,ݑ൫݅݉ݏܷܲܶ where wUT + wUP+ wUTP=1, wUT, wUP and wUTP are the weighs to the three similarity measures, respectively. The weighs can be adjusted for different dataset. We can see the similarity measure of users is based on RU that defined in section 3.2
Tag Based Collaborative Filtering for Recommender Systems 671 imilarly, the similarity between two items is based on Rp and is defined as formu la(5)below: simp(uj, u)=Wpu. PUsim(pi, pi)+WPr. PTsim(pipi)+Wput PUTsim(pi, pi) where WPU, WPT, WPUT =l, are the weights and their sum is 1, and PUsim(pi, p PTsim(Pi, P), PUTsim(P p) are defined as follows (1)PTsim(pi, p)): The similarity of two items based on the percentage of being put in the same tag, which is also computed based on the relationship RuT, but in the perspective of items PTsim(pi, Pi) ITPinTpil maxpkEPlITPKl As defined in section 3.2, Tp is the tag set of item Pk, Tp=(tI ET,=; EU, E(u;, 5; (2)PUsim(Pi, Pi): the similarity of two items based on the percentage of being tagged by the same user, which is also calculated based on the relationship Ru, P, and in the perspective of items. PUsim(pi, pD maxpkEPIIUpkI] As defined in section 3. 2, Upk is the user set of item pk Up=u;IuEU,ELET, (3)PUTsim(pi, Pi): the similarity of the two items based on the percentage of common tag-item relationship, which is computed based on Rp PUTsim(pi, pi) UPinUPjl (8) laxpkEPlIUPklk As defined in section 3.2, UP is the user and item set of tag t. UP=(l uEU, PkEP, and E(u; t. Px)=ll Though it's possible to calculate the similarity of two tags, it is not discussed in this 3.4 Recommendation Generation For a target user u;, using the similarity measures discussed in section 3.3, we can generate the users neighbourhood which contains users who have similar information needs or item preferences as u; according to their tagging behaviour. We propose two methods to make item recommendations to the target user ui, namely, a user based approach and an item based approach, based on the neighbour users'item lists or the Let C(ui)be the neighbourhood of u For the user based approach, the candidate items for ui are taken from the items tagged by the users in C(ui). For each candidate item Pk, based on the similarity between u; and its neighbour users, and the neighbour users' implicit ratings to px that is denoted as E(u;, Pu, a prediction score denoted as A(u;, Pk) is calculated using Equation(9) given below. According to the prediction scores, the top n items will be recommended to u
Tag Based Collaborative Filtering for Recommender Systems 671 Similarly, the similarity between two items is based on RP and is defined as formula (5) below: · ்ݓ൯ ,൫݅݉ݏܶܲ · ்ݓ൯ ,൫݅݉ݏܷܲ · ݓ൯ൌݑ ,ݑ൫݅݉ݏ (5 (൯ ,൫݅݉ݏܷܶܲ where wPU, wPT, wPUT =1, are the weights and their sum is 1, and PUsim(pi, pj), PTsim(pi, pj), PUTsim(pi, pj) are defined as follows: (1) ܲܶݏ݉݅൫ ,൯: The similarity of two items based on the percentage of being put in the same tag, which is also computed based on the relationship RU, T , but in the perspective of items. |ೕ்ת்| ൌ ൯ ,൫݅݉ݏܶܲ ୫ୟ୶ೖאುሼ|்ೖ|ሽ (6) As defined in section 3.2, Tpk is the tag set of item pk, Tpk= {ti | tj∈T, ∃ui∈U, E(ui,tj, pk) =1}. (2) ܷܲݏ݉݅൫ ,൯): the similarity of two items based on the percentage of being tagged by the same user, which is also calculated based on the relationship RU, P, and in the perspective of items. |ೕת| ൌ ൯ ,൫݅݉ݏܷܲ ୫ୟ୶ೖאುሼ|ೖ|ሽ (7) As defined in section 3.2, Upk is the user set of item pk. Upk= {ui | ui∈U, ∃ tj ∈ T, E(ui,tj, pk) =1}. (3) ܷܲܶݏ݉݅൫ ,൯: the similarity of the two items based on the percentage of common tag-item relationship, which is computed based on RP, UT. |ೕת| ൌ ൯ ,൫݅݉ݏܷܶܲ ୫ୟ୶ೖאುሼ|ೖ|ሽ (8) As defined in section 3.2, UPj is the user and item set of tag tj. UPj= {| ui∈U,pk∈P, and E(ui,tj,pk)=1}. Though it’s possible to calculate the similarity of two tags, it is not discussed in this paper. 3.4 Recommendation Generation For a target user ui, using the similarity measures discussed in section 3.3, we can generate the user’s neighbourhood which contains users who have similar information needs or item preferences as ui according to their tagging behaviour. We propose two methods to make item recommendations to the target user ui, namely, a user based approach and an item based approach, based on the neighbour users’ item lists or the similarity of items, respectively. Let C(ui) be the neighbourhood of ui. For the user based approach, the candidate items for ui are taken from the items tagged by the users in C(ui). For each candidate item pk, based on the similarity between ui and its neighbour users, and the neighbour users’ implicit ratings to pk that is denoted as E(uj, pk), a prediction score denoted as Au (ui,pk) is calculated using Equation (9) given below. According to the prediction scores, the top N items will be recommended to ui.
672 H. Liang et al. For the item based approach, the prediction score is calculated by formula(10) A, pk) XuEcquisimuquiuj) E(uj.Pk) curL APQui, Pk)= 2piEPpu simp(Pk,Pi) 4 Experiments We have conducted experiments to evaluation the methods proposed in Section 3. The dataset for the experiments is obtained from Amazon. com. To avoid severe sparsity problem, we selected those users who tagged at least 5 items, tags that are used by at least 5 users, and items that are tagged at least 5 times. The final dataset comprises 3179 users, 8083 tags and 11942 books. The whole dataset is split into a test dataset and a training dataset and the split per- centage is 50% each. For each user in the testing dataset, a prediction score will be calculated for each item tagged by this user (i.e. the items which have implicit rating 1. ) The top N items will be recommended to the user. The precision and recall are sed to evaluate the accuracy of recommendations. If any item in the recommendation list has implicit rating l in the testing dataset, the item is counted as a hit. To evaluate the effectiveness of the proposed tag based collaborative filtering ap- roach, we compared the precision and recall of the recommended top 5 items of the proposed approach with the performance of the standard collaborative filtering(CF) approaches that only use the item information and also compared with Tso-Sutter approach that extends the user rating matrix with the tag information. In fact, the proposed approach covers the two approaches when some of the similarity measure weights are set to zero. The comparison of precision and recall of user-based ap proaches is illustrated in Figure l, while item-based comparison is shown in Figure 2. Itemrbased approach User-based approach 0.16 罪l 6时2 Traditional Ta Traditional Tag-asare Tag-based Fig. 1. Results of comparing the proposed Fig. 2. Results of comparing the proposed Tag-based collaborative filtering employing Tag-based collaborative filtering employing user-based approach with user-based baseline item-based approach with item-based baseline model and the user-based Tag-ware approach model and the item-based Tag-ware approach proposed by Tso-Sutter prop
672 H. Liang et al. For the item based approach, the prediction score is calculated by formula (10). ܣ௨ሺݑ ,ሻ ൌ ∑ ௦ೠሺ௨,௨ೕሻ·ாሺ௨ೕ ೠ ,ೖሻ אሺೠሻ |ሺ௨ሻ| (9) ܣሺݑ ,ሻ ൌ ∑ ݏ݉݅ሺ ,ሻ ೕאೠ (10) 4 Experiments We have conducted experiments to evaluation the methods proposed in Section 3. The dataset for the experiments is obtained from Amazon.com. To avoid severe sparsity problem, we selected those users who tagged at least 5 items, tags that are used by at least 5 users, and items that are tagged at least 5 times. The final dataset comprises 3179 users, 8083 tags and 11942 books. The whole dataset is split into a test dataset and a training dataset and the split percentage is 50% each. For each user in the testing dataset, a prediction score will be calculated for each item tagged by this user (i.e., the items which have implicit rating 1.). The top N items will be recommended to the user. The precision and recall are used to evaluate the accuracy of recommendations. If any item in the recommendation list has implicit rating 1 in the testing dataset, the item is counted as a hit. To evaluate the effectiveness of the proposed tag based collaborative filtering approach, we compared the precision and recall of the recommended top 5 items of the proposed approach with the performance of the standard collaborative filtering (CF) approaches that only use the item information and also compared with Tso-Sutter’s approach that extends the user rating matrix with the tag information. In fact, the proposed approach covers the two approaches when some of the similarity measure weights are set to zero. The comparison of precision and recall of user-based approaches is illustrated in Figure 1, while item-based comparison is shown in Figure 2. Item-based approach 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Traditional CF Tag-aware method Tag-based CF Top 5 Precision Recall User-based approach 0 0.02 0.04 0.06 0.08 0.1 0.12 Traditional CF Tag-aware method Tag-based CF Top 5 Precision Recall Fig. 1. Results of comparing the proposed Tag-based collaborative filtering employing user-based approach with user-based baseline model and the user-based Tag-ware approach proposed by Tso-Sutter Fig. 2. Results of comparing the proposed Tag-based collaborative filtering employing item-based approach with item-based baseline model and the item-based Tag-ware approach proposed by Tso-Sutter
Tag Based Collaborative Filtering for Recommender Systems 673 5 Discussion The experiment results in Figure 1 and Figure 2 show that the precision and recall of the proposed approach are better than the traditional user- and item-based models and Tso-Sutter's approaches Though Tso-Sutter claimed that the tag information can only be useful to user and item fused collaborative filtering and it will be seen as noise for standard user- and item-based CF alone, our experiment results show that tag information can be used to improve the standard user-based and item-based collaborative filtering Besides, the experiment results also show that the traditional collaborative filterin recommendation based on the similarity of rating behavior doesn't work well to process the collaborative tagging information. The results suggest that it is more accu- rate and correct to profile user with tag, item and the relationship between tag and item than profiling user with extended implicit rating. Furthermore, the results also suggest that it is better to measure the similarity based on the similarity of tagging behaviour than just measuring it as implicit rating similarity 6 Conclusion This paper discusses how to recommend items to users based on collaborative tagging information. Instead of treating tagging behavior as just implicit rating behavior, the proposed tag based collaborative filtering approach uses the three dimensional rela tionship of tagging behavior to profile users and generate likely minded neighbors or similar items. The experiments show promising results of employing the tag based collaborative filtering approach to recommend personalized items. The experiment results also prove that the tag information can be used to improve the standard user based and item-based collaborative filtering approaches References 1. Halpin, H, Robu, V, Shepherd, H. The Complex Dynamics of Collaborative Tagging In The 16th international conference on World wide Web, pp. 211-220. ACM, New York (2007) Bao, S, Wu, X, Fei, B, Xue, S.Z., Yu, Y: Optimizing Web Search Using Social Annota- ons. In: The 16th international conference on World Wide Web, pp. 501-510. ACM, New York(2007 Marinho, L B, Schmidt-Thieme, L: Collaborative tag recommendations: Data Analysis Machine Learning and Applications. In: The 31st Annual Conference of the Gesellschaft fuir Klassifikation, pp. 533-540. Springer, Heidelberg(2007) 4. Golder, S.A.: Usage patterns of collaborative tagging systems. Journal of Information Science32(2),198-208(2006) Tso-Sutter, K.H. L, Marinho, L B, Schmidt-Thieme, L. Tag-aware Recommender Systems by Fusion of Collaborative Filtering Algorithms. In: The 2008 ACM symposium on Ap plied computing, pp 1995-1999. ACM, New York(2008)
Tag Based Collaborative Filtering for Recommender Systems 673 5 Discussion The experiment results in Figure 1 and Figure 2 show that the precision and recall of the proposed approach are better than the traditional user- and item-based models and Tso-Sutter’s approaches. Though Tso-Sutter claimed that the tag information can only be useful to user and item fused collaborative filtering and it will be seen as noise for standard user- and item-based CF alone, our experiment results show that tag information can be used to improve the standard user-based and item-based collaborative filtering. Besides, the experiment results also show that the traditional collaborative filtering recommendation based on the similarity of rating behavior doesn’t work well to process the collaborative tagging information. The results suggest that it is more accurate and correct to profile user with tag, item and the relationship between tag and item than profiling user with extended implicit rating. Furthermore, the results also suggest that it is better to measure the similarity based on the similarity of tagging behaviour than just measuring it as implicit rating similarity. 6 Conclusion This paper discusses how to recommend items to users based on collaborative tagging information. Instead of treating tagging behavior as just implicit rating behavior, the proposed tag based collaborative filtering approach uses the three dimensional relationship of tagging behavior to profile users and generate likely minded neighbors or similar items. The experiments show promising results of employing the tag based collaborative filtering approach to recommend personalized items. The experiment results also prove that the tag information can be used to improve the standard userbased and item-based collaborative filtering approaches. References 1. Halpin, H., Robu, V., Shepherd, H.: The Complex Dynamics of Collaborative Tagging. In: The 16th international conference on World Wide Web, pp. 211–220. ACM, New York (2007) 2. Bao, S., Wu, X., Fei, B., Xue, S.Z., Yu, Y.: Optimizing Web Search Using Social Annotations. In: The 16th international conference on World Wide Web, pp. 501–510. ACM, New York (2007) 3. Marinho, L.B., Schmidt-Thieme, L.: Collaborative tag recommendations: Data Analysis, Machine Learning and Applications. In: The 31st Annual Conference of the Gesellschaft für Klassifikation, pp. 533–540. Springer, Heidelberg (2007) 4. Golder, S.A.: Usage patterns of collaborative tagging systems. Journal of Information Science 32(2), 198–208 (2006) 5. Tso-Sutter, K.H.L., Marinho, L.B., Schmidt-Thieme, L.: Tag-aware Recommender Systems by Fusion of Collaborative Filtering Algorithms. In: The 2008 ACM symposium on Applied computing, pp. 1995–1999. ACM, New York (2008)