正在加载图片...
Personalized Tag Recommendations via Tagging and Content-based similarity Metrics Andrew Byde Hui Wan Steve Cayzer HP Labs State University New York Filton road. stoke Gifford at Stony Brook Filton Road. Stoke Gifford Bristol. UK Stony Brook, New York 11790 Bristol. UK andrewbyde@hp.com hwan@cs.sunysb. edu steve.cayzer@hp.com Abstract claim the following benefits of using them: This short paper describes a novel technique for generating 1. The content-based method is capable of recommending personalized tag recommendations for users of social book- terms even for urls that have not been previously tagged narking sites such as delicio us. Existing techniques recom- by anyone. This is of clear benefit for document collec- mend tags on the basis of their popularity among the group of tions that are at present sparsely tagged- such as is all users; on the basis of recent use; or on the basis of simple typical within the enterprise euristics to extract keywords from the url being tagged. Our method is designed to complement these approaches, and is 2. For a url with a very large number of tags, a users' pre- based on recommending tags from urls that are similar to the ferred tags will likely be diluted by other tags relevant one in question, according to two distinct similarity metrics to different areas of interest, different specific vocab- whose principal utility covers complementary cases. laries, and different languages or character sets than the user is interested in. Our method recommends from Keywords within the users'own field of interest. and thus is more pertinent and useful. Tagging, Bookmarking, Classification 2. Method 1. Introduction The problem we face is, given a collection of urls u, tagged This paper addresses tag recommendation in social book- with a set of tags Tp(u)by users p, to provide a particular marking sites. We address two problems, namely paucity user with a list of N recommended terms for a particula of information for tag recommendation in the case of too fe url. We will evaluate such a recommendation method other users having tagged a url; and personalization of tag respect to del icio us data scraped from the web. For each of recommendations a collection of users, we take each url that they have tagged The first of these issues is especially important in the emerg. In turn. We fetch the " common"tags for that url, and crop g field of enterprise-scale bookmarking and social network- the list to the top M ing sites: the manifest knowledge management value that Our method for the task of recommending tags to user p for such sites provide has not gone un-noticed in the world of url u is as follows: For each url u' that the user has already business, but a key problem there is the lack of scale. While tagged, we calculate a similarity sim(u, u)E[0, 1]to the task on the web, most pages I might choose to tag will already url, to be described shortly. Given these similarities for each have been tagged by someone - whose recommendation car assist my choice and aid term-convergence- this is not the se within an enterprise, where there are not enough users tpa()=∑ for the system to rely on recommendations from peers. The i:t∈Tp(u) second issue operates at any scale, and comes down to the The weight depends on the user in that we only sum similar- observation that the laudable bias towards term-convergence ities to u over other urls ui that the user has tagged. The provided by using other users existing tags as recommen- weights on each tag provided a ranking of tags, and we se- dations discourages the easy development of personalized mantics lected the top N ranked tags as the recommendation In this paper lop tag recommendations based Note that this method scales with the number of urls tagged tagging"-and by the user, not the total number of all urls tagged by “ content”- based Section 2). Our method recom- Iser. As we shall see, the median number of urls tagged is mends terms that the user has already used, selected accord- 50, and only a tiny minority tag more than 1000, meaning ing to analysis of the url in question. We envisage the terms that our method is scalable to cases of practical interest recommended by these methods being presented in parallel 2.1 Similarity Metrics with the"common"(frequent) terms used by other users, and Our similarity metrics are both variants on the cosine simi- larity familiar from text mining and information retrieval [ 1]Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics Andrew Byde HP Labs Filton Road, Stoke Gifford Bristol, UK andrew.byde@hp.com Hui Wan State University New York at Stony Brook Stony Brook, New York 11790 hwan@cs.sunysb.edu Steve Cayzer HP Labs Filton Road, Stoke Gifford Bristol, UK steve.cayzer@hp.com Abstract This short paper describes a novel technique for generating personalized tag recommendations for users of social book￾marking sites such as del.icio.us. Existing techniques recom￾mend tags on the basis of their popularity among the group of all users; on the basis of recent use; or on the basis of simple heuristics to extract keywords from the url being tagged. Our method is designed to complement these approaches, and is based on recommending tags from urls that are similar to the one in question, according to two distinct similarity metrics, whose principal utility covers complementary cases. Keywords Tagging, Bookmarking, Classification 1. Introduction This paper addresses tag recommendation in social book￾marking sites. We address two problems, namely paucity of information for tag recommendation in the case of too few other users having tagged a url; and personalization of tag recommendations. The first of these issues is especially important in the emerg￾ing field of enterprise-scale bookmarking and social network￾ing sites: the manifest knowledge management value that such sites provide has not gone un-noticed in the world of business, but a key problem there is the lack of scale. While on the web, most pages I might choose to tag will already have been tagged by someone – whose recommendation can assist my choice and aid term-convergence – this is not the case within an enterprise, where there are not enough users for the system to rely on recommendations from peers. The second issue operates at any scale, and comes down to the observation that the laudable bias towards term-convergence provided by using other users’ existing tags as recommen￾dations discourages the easy development of personalized se￾mantics. In this paper we will develop tag recommendations based on two different page similarity metrics (a “tagging”- and “content”-based method, see Section 2). Our method recom￾mends terms that the user has already used, selected accord￾ing to analysis of the url in question. We envisage the terms recommended by these methods being presented in parallel with the “common” (frequent) terms used by other users, and claim the following benefits of using them: 1. The content-based method is capable of recommending terms even for urls that have not been previously tagged by anyone. This is of clear benefit for document collec￾tions that are at present sparsely tagged – such as is typical within the enterprise. 2. For a url with a very large number of tags, a users’ pre￾ferred tags will likely be diluted by other tags relevant to different areas of interest, different specific vocab￾ularies, and different languages or character sets than the user is interested in. Our method recommends from within the users’ own field of interest, and thus is more pertinent and useful. 2. Method The problem we face is, given a collection of urls u, tagged with a set of tags Tp(u) by users p, to provide a particular user with a list of N recommended terms for a particular url. We will evaluate such a recommendation method with respect to del.icio.us data scraped from the web. For each of a collection of users, we take each url that they have tagged in turn. We fetch the “common” tags for that url, and crop the list to the top N. Our method for the task of recommending tags to user p for url u is as follows: For each url u 0 that the user has already tagged, we calculate a similarity sim(u, u0 ) ∈ [0, 1] to the task url, to be described shortly. Given these similarities for each url, we summed similarities to give a user-specific weight for each tag: wp,u(t) = X i:t∈Tp(ui) sim(u, ui). (1) The weight depends on the user in that we only sum similar￾ities to u over other urls ui that the user has tagged. The weights on each tag provided a ranking of tags, and we se￾lected the top N ranked tags as the recommendation. Note that this method scales with the number of urls tagged by the user, not the total number of all urls tagged by every user. As we shall see, the median number of urls tagged is 150, and only a tiny minority tag more than 1000, meaning that our method is scalable to cases of practical interest. 2.1 Similarity Metrics Our similarity metrics are both variants on the cosine simi￾larity familiar from text mining and information retrieval [1]: sim(u, u 0 ) = u · u 0 p (u · u)(u0 · u0) , (2)
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有