正在加载图片...
EXPLORING INFORMATION HIDDEN IN TAGS: A SUBJECT-BASED ITEM RECOMMENDATION APPROACH Jing peng Daniel Zeng nstitute of Automation, Chinese Academy of Sciences Department of Management Information Systems, The University of Arizona Jing peng cla ac CI zeng @email. arizona. edu Abstract Collaborative tagging sites allow users to bookmark and annotate their favorite Web contents with tags. These tags provide a novel source of information for collaborative filtering (CF). Research on how to i mprove i tem recommendation q uality leveraging t ags i s emerging yet i nformation idden in tags is far from being fully exploited. In this paper, we aim at finding informative usage patterns from tags by consistent c lustering on tags us ing nonnegative matrix factorization. The clustered subjects, represented by w eighed t ag v ectors, can t hen be used to b uild a s ubject centered us er i nformation seeking model for i tem recommendation. E xperiments o n two reai world datasets show that our subject-based algorithms substantially outperform the traditional CF methods as well as tag-enhanced recommendation approaches reported in the literature Keywords: Collaborative filtering, collaborative tagging, nonnegative matrix factorization, tag-enhanced 1. Introduction Collaborative tagging or social bookmarking sites, such as Delicious (del. icio. us) and CiteULike www.citeulike.org),allowuserstobookmarktheirfavoriteWebcontents(oritems)onlineandprovide tag-based annotations to facilitate future retrieval. Tagging data present interesting opportunities as well as challenges to CF as tags are available as a novel source of data in addition to the typical user-item interaction information Recently, there has been a growing number of efforts aiming to improve Web content or item recommendation quality leveraging tags. Zhao et al.(Zhao et al. 2008)proposed to compute the similarit of two users based on the semantic distance of their tag sets on common items they have bookmarked Tso-Sutter et al. (Tso-Sutter et al. 2008)presented an interesting approach to integrate tags into traditional CF algorithms. They extended the item vectors for users and user vectors for items with tags and then onstructed the user/item neighborhoods based on the extended user/item profiles. Instead of exploring gs for similarity computation, the topic-based method was proposed to take advantage of tags in a probabilistic framework( Peng et al. 2009), viewing each tag as an indicator of a topic and then estimating the probability of a user saving an item by summing up the transition probabilities through all tags However, none of the existing research has explicitly explored why and how a user chooses to bookmark an item and assigns a certain set of tags. We hypothesize that exploiting such information embedded in tagging activities could lead to better recommendation performance. A record in tagging data is a tuple consisting of three fields in the form of <user, item, tag>. Some preliminary work(Halpin et al. 2007; Lambiotte et al. 2006)has started to examine a tripartite graph structure of collaborative tagging, as shown in Figure 1(a). The topic-based method(Peng et al. 2009) assumes this structure implicitly Nevertheless, this tripartite structure suffers from the following problems: i)a tag usually covers a wide range of topics rather than a specific subject. (ii) Users may be linked to items that they are not interested in at all due to polysemy of tags (iii) Users may miss items they are actually interested in because synonyms are used. (iv) There exists a large number of tags(e.g,"toread, ** ) carrying no value to users other than their originators 19th Workshop on Information Technologies and SystemsEXPLORING INFORMATION HIDDEN IN TAGS: A SUBJECT-BASED ITEM RECOMMENDATION APPROACH Jing Peng1 Daniel Zeng2, 1 1 Institute of Automation, Chinese Academy of Sciences 2 Department of Management Information Systems, The University of Arizona jing.peng@ia.ac.cn zeng@email.arizona.edu Abstract Collaborative tagging sites allow users to bookmark and annotate their favorite Web contents with tags. These tags provide a novel source of information for collaborative filtering (CF). Research on h ow t o i mprove i tem r ecommendation q uality l everaging t ags i s emerging y et i nformation hidden in tags is far from being fully exploited. In this paper, we aim at finding informative usage patterns from tags by c onsistent c lustering o n t ags us ing n onnegative matrix f actorization. The clustered subjects, r epresented by w eighed t ag v ectors, c an t hen be used t o b uild a s ubject￾centered us er i nformation seeking model f or i tem r ecommendation. E xperiments o n two real￾world datasets show that our subject-based algorithms substantially outperform the traditional CF methods as well as tag-enhanced recommendation approaches reported in the literature. Keywords: Collaborative filtering, collaborative tagging, nonnegative matrix factorization, tag-enhanced 1. Introduction Collaborative tagging or social bookmarking sites, such as Delicious (del.icio.us) and CiteULike (www.citeulike.org), allow users to bookmark their favorite Web contents (or items) online and provide tag-based annotations to facilitate future retrieval. Tagging data present interesting opportunities as well as challenges to CF as tags are available as a novel source of data in addition to the typical user-item interaction information. Recently, there has been a growing number of efforts aiming to improve Web content or item recommendation quality leveraging tags. Zhao et al. (Zhao et al. 2008) proposed to compute the similarity of two users based on the semantic distance of their tag sets on common items they have bookmarked. Tso-Sutter et al. (Tso-Sutter et al. 2008) presented an interesting approach to integrate tags into traditional CF algorithms. They extended the item vectors for users and user vectors for items with tags and then constructed the user/item neighborhoods based on the extended user/item profiles. Instead of exploring tags for similarity computation, the topic-based method was proposed to take advantage of tags in a probabilistic framework (Peng et al. 2009), viewing each tag as an indicator of a topic and then estimating the probability of a user saving an item by summing up the transition probabilities through all tags. However, none of the existing research has explicitly explored why and how a user chooses to bookmark an item and assigns a certain set of tags. We hypothesize that exploiting such information embedded in tagging activities could lead to better recommendation performance. A record in tagging data is a tuple consisting of three fields in the form of <user, item, tag>. Some preliminary work (Halpin et al. 2007; Lambiotte et al. 2006) has started to examine a tripartite graph structure of collaborative tagging, as shown in Figure 1 (a). The topic-based method (Peng et al. 2009) assumes this structure implicitly. Nevertheless, this tripartite structure suffers from the following problems: (i) A tag usually covers a wide range of topics rather than a specific subject. (ii) Users may be linked to items that they are not interested in at all due to polysemy of tags. (iii) Users may miss items they are actually interested in because synonyms are used. (iv) There exists a large number of tags (e.g., “toread,” “***”) carrying no value to users other than their originators. 73 19th Workshop on Information Technologies and Systems
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有