ARTICLE IN PRESS YJCSS: 2537 Journal of Computer and System Sciences命(命)·命一·命 Contents lists available at sciverse Science Direct Journal of Computer and System Sciences ENCES ELSEVIER www.elsevier.com/locate/jcss Resource recommendation in social annotation systems A linear-weighted hybrid approach Jonathan Gemmell Thomas Schimoler, Bamshad Mobasher, Robin Burke Center for Web Intelligence, School of Computing. DePaul University, 243 South Wabash Avenue, Chicago, IL 60604, United States ARTICLE INFO A BSTRACT cial annotation systems enable the organization of online resources with user-defined 29 April 2011 in revised form 22 July 20 users can discover resources, organize and share their finds, and connect to other users Available online xxxx with similar interests. howey size and complexity of these systems can lead to information overload and reduced utility for users. For these reasons, researchers have ought to apply the techniques of recommender systems to deliver personalized views of social annotation systems. To date, most efforts have concentrated on the problem of tag recommendation- personalized suggestions for possible annotations. Resource Hybrid recommenders recommendation has not received the same systematic evaluation, in part because the task is inherently more complex. In this article, we provide a general formulation for the problem of resource recommendation in social annotation systems that captures these variants, and we evaluate two cases: basic resource recommendation and tag- specific resource recommendation. We also propose a linear-weighted hybrid framework for resource recommendation Using six real-world datasets, we show that its integrative pproach is essential for this recommendation task and provides the most adaptability given the varying data characteristics in different social annotation systems. We find that our algorithm is more effective than other more mathematically-complex techniques and has the additional advantages of flexibility and extensibility e 2011 Elsevier Inc. All rights reserved. 1. Introdu The surge in popularity of social media systems shows no sign of abating. These systems leverage vast amounts of user-generated content, enhancing the users ability to organize information, explore resources and build communities of like-minded individuals. One class of these applications is the social annotation system in which user-generated content takes the form of tags, arbitrary labels applied by users to online resources Social annotation systems often focus on a particular type of resource. Delicious users annotate Web pages. LastFM2 permits its users to upload and share their musical tastes. Citeulike users organize scholarly publications. In addition, tagging functions have become a common feature in many established Web sites, such as Amazon 4 You Tube and others mail addresses: gemmell@cs. depauLedu ( Gemmell), tschimolerecs depauLedu (T. Schimoler mobasherecs depauLedu(B Mobasher). burke@cs. depaul.edu(R Burke ) 2www.last.fm. www.youtube.com. 0022-0000/S- see front matter 2011 Elsevier Inc. All rights reserved. doi:10016jcss201110.006 Comput. System Sci. (2011), doi..J.Gem Please cite this article in press urce recommendation in social annotation systems: A linear-weighted hybrid approach, J 1016jcss,201110.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.1 (1-15) Journal of Computer and System Sciences ••• (••••) •••–••• Contents lists available at SciVerse ScienceDirect Journal of Computer and System Sciences www.elsevier.com/locate/jcss Resource recommendation in social annotation systems: A linear-weighted hybrid approach Jonathan Gemmell ∗, Thomas Schimoler, Bamshad Mobasher, Robin Burke Center for Web Intelligence, School of Computing, DePaul University, 243 South Wabash Avenue, Chicago, IL 60604, United States article info abstract Article history: Received 29 April 2011 Received in revised form 22 July 2011 Accepted 17 October 2011 Available online xxxx Keywords: Resource recommendation Social annotation system Hybrid recommenders Social annotation systems enable the organization of online resources with user-defined keywords. Collectively these annotations provide a rich information space in which users can discover resources, organize and share their finds, and connect to other users with similar interests. However, the size and complexity of these systems can lead to information overload and reduced utility for users. For these reasons, researchers have sought to apply the techniques of recommender systems to deliver personalized views of social annotation systems. To date, most efforts have concentrated on the problem of tag recommendation – personalized suggestions for possible annotations. Resource recommendation has not received the same systematic evaluation, in part because the task is inherently more complex. In this article, we provide a general formulation for the problem of resource recommendation in social annotation systems that captures these variants, and we evaluate two cases: basic resource recommendation and tagspecific resource recommendation. We also propose a linear-weighted hybrid framework for resource recommendation. Using six real-world datasets, we show that its integrative approach is essential for this recommendation task and provides the most adaptability given the varying data characteristics in different social annotation systems. We find that our algorithm is more effective than other more mathematically-complex techniques and has the additional advantages of flexibility and extensibility. © 2011 Elsevier Inc. All rights reserved. 1. Introduction The surge in popularity of social media systems shows no sign of abating. These systems leverage vast amounts of user-generated content, enhancing the user’s ability to organize information, explore resources and build communities of like-minded individuals. One class of these applications is the social annotation system in which user-generated content takes the form of tags, arbitrary labels applied by users to online resources. Social annotation systems often focus on a particular type of resource. Delicious1 users annotate Web pages. LastFM2 permits its users to upload and share their musical tastes. Citeulike3 users organize scholarly publications. In addition, tagging functions have become a common feature in many established Web sites, such as Amazon,4 YouTube5 and others. * Corresponding author. E-mail addresses: jgemmell@cs.depaul.edu (J. Gemmell), tschimoler@cs.depaul.edu (T. Schimoler), mobasher@cs.depaul.edu (B. Mobasher), rburke@cs.depaul.edu (R. Burke). 1 delicious.com. 2 www.last.fm. 3 www.citeulike.org. 4 www.amazon.com. 5 www.youtube.com. 0022-0000/$ – see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jcss.2011.10.006
ARTICLE IN PRESS YJCSS: 253 J Gemmell et aL/ Journal of Computer and System Sciences··()命一· Resources Ta Fig. 1. Tag recommendation. resource3 esources Tags Fig. 2. Resource recommendation. The term folksonomy is often used to describe social annotation systems, a play on folk and taxonomy 1. These systems are popular in part because they allow users to tag resources with any tag they wish, free from any preconceived conceptual hierarchy. The aggregation of resources and tags used to describe them produces a rich information space in which users interact 3]. Users may also appreciate the ability to prepare resources for their own future access and the possibility of expressing opinions about resources [4]. These are not features present in traditional manual indexing settings The freedom and richness that social annotation systems provide do not come without a cost. Because the number of users, resources or tags in these systems is often measured in the millions, the sheer volume of data can quickly burden he user with information overload. The unrestricted nature of the tagging function is liberating, but also means that the resulting tag data will be noisy Ambiguous tags abound: one user may apply "jaguar"only to cars, another only to large felines 3. An early survey of social annotation systems identified another obstacle; while the uncontrolled vocabulary common in most systems reduces the entry cost and invites more participants, precision of stricter taxonomies [5. There is no mechanism to enforce specific semantics for tags. Redundant tag ng synonyms and mis-spellings cannot be prevented and make it more difficult for a user to choose tags on Ambiguous and redundant tags have been shown to impact the quality of recommendations [6] For these reasons, recommender systems have much to offer social annotation systems. As in e-commerce settings, recommendation function in social annotation systems is considerably more complex than in the e-commerce applications to which it has typically been applied. Research in recommendation has, for the most part, concentrated on systems in hich users' preferences with respect to items have been expressed through simple scalar values- ratings. Tags represent user interest and preference in more detail, but also make comparisons between users and between items more difficult. In addition to the increased complexity of the data, social annotation systems offer a variety of avenues for recommen- ation not found in catalog-type systems. The problem of tag recommendation(shown schematically in Fig. 1)has received a great deal of attention. The user already has a document or resource in hand, and the recommendation serves to assist in the annotation operation, helping her find tags appropriate to the resource and relevant to her own interests. Graph- based models [7], tensor factorization [8-10], and simpler linear hybrids [11-13] have all been employed to bring the three dimensions of the data to bear on the problem of finding tags Resource recommendation is less well studied, although perhaps it is more fundamental to the task of navigating through social annotation system, as opposed to the task of building one. The most basic type of resource recommendation is shown in Fig. 2. A user requests a personalized set of new and interesting resources to be recommended to him out of the universe of items known to the system. Previous work in this area has explored various methods such as tensor factoriza- tion [10], clustering[14, 15. and hybrid models [16]. However, resource recommendation within social annotation systems has yet to be studied in a systematic manner; previous work often focuses on a special case of resource recor Tasks vary according to the types of constraints associated with users' actions as they interact with the system. For example, a user may request resources similar to those found is his profile, annotated with a selected tag or related to a pa resource. Each of the cases requires a different strategy 6 While the term is relatively new. it has been argued that social annotation actually represents a renaissance of old-style manual indexing, largely Please cite this article in press as: J. Gemmell et aL., Resource recommendation in social annotation systems: A linear-weighted hybrid approach, J. Comput. System Sci. (2011). doi: 10.1016/j jcSs 2011. 10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.2 (1-15) 2 J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• Fig. 1. Tag recommendation. Fig. 2. Resource recommendation. The term folksonomy is often used to describe social annotation systems, a play on folk and taxonomy [1].6 These systems are popular in part because they allow users to tag resources with any tag they wish, free from any preconceived conceptual hierarchy. The aggregation of resources and tags used to describe them produces a rich information space in which users can interact [3]. Users may also appreciate the ability to prepare resources for their own future access and the possibility of expressing opinions about resources [4]. These are not features present in traditional manual indexing settings. The freedom and richness that social annotation systems provide do not come without a cost. Because the number of users, resources or tags in these systems is often measured in the millions, the sheer volume of data can quickly burden the user with information overload. The unrestricted nature of the tagging function is liberating, but also means that the resulting tag data will be noisy. Ambiguous tags abound: one user may apply “jaguar” only to cars, another only to large felines [3]. An early survey of social annotation systems identified another obstacle; while the uncontrolled vocabulary common in most systems reduces the entry cost and invites more participants, it also lacks the precision of stricter taxonomies [5]. There is no mechanism to enforce specific semantics for tags. Redundant tags including synonyms and mis-spellings cannot be prevented and make it more difficult for a user to choose tags on which to search. Ambiguous and redundant tags have been shown to impact the quality of recommendations [6]. For these reasons, recommender systems have much to offer social annotation systems. As in e-commerce settings, they can alleviate information overload and combat noise by personalizing the user’s view of the system. However, the recommendation function in social annotation systems is considerably more complex than in the e-commerce applications to which it has typically been applied. Research in recommendation has, for the most part, concentrated on systems in which users’ preferences with respect to items have been expressed through simple scalar values – ratings. Tags represent user interest and preference in more detail, but also make comparisons between users and between items more difficult. In addition to the increased complexity of the data, social annotation systems offer a variety of avenues for recommendation not found in catalog-type systems. The problem of tag recommendation (shown schematically in Fig. 1) has received a great deal of attention. The user already has a document or resource in hand, and the recommendation serves to assist in the annotation operation, helping her find tags appropriate to the resource and relevant to her own interests. Graphbased models [7], tensor factorization [8–10], and simpler linear hybrids [11–13] have all been employed to bring the three dimensions of the data to bear on the problem of finding tags. Resource recommendation is less well studied, although perhaps it is more fundamental to the task of navigating through a social annotation system, as opposed to the task of building one. The most basic type of resource recommendation is shown in Fig. 2. A user requests a personalized set of new and interesting resources to be recommended to him out of the universe of items known to the system. Previous work in this area has explored various methods such as tensor factorization [10], clustering [14,15], and hybrid models [16]. However, resource recommendation within social annotation systems has yet to be studied in a systematic manner; previous work often focuses on a special case of resource recommendation. Tasks vary according to the types of constraints associated with users’ actions as they interact with the system. For example, a user may request resources similar to those found is his profile, annotated with a selected tag or related to a particular resource. Each of the cases requires a different strategy. 6 While the term is relatively new, it has been argued that social annotation actually represents a renaissance of old-style manual indexing, largely supplanted by text-based information retrieval [2]
ARTICLE IN PRESS YJCSS: 2537 /. Gemmell et aL/ Journal of Computer and System Sciences·命·(申·)·一· tt2,…t r|{t,ty,…t A Fig 3. Annotations This paper has three main aims. First, we will describe, in a formal manner, the task of resource recommendation in social annotation systems, examining current approaches and contrasting it with work on tag recommendation and other similar tasks. We then introduce a linear-weighted hybrid recommendation algorithm and show how this technique serves to combine multiple complementary components into a single integrated model that provides the most flexibility considering he unique characteristics across different social annotation systems. Finally, we show the performance of the algorithm on ix large real-world datasets, and on two different variants of the resource recommendation task. Our results show that the hybrid performs better than the individual components alone, and better than the leading factorization algorithm used in g recommendation. Moreover the hybrid has advantages over other integrative models in that it is simple, flexible and The rest of the paper is organized as follows In Section 2 we formally define the notion of lated work on recommendation techniques in social annotation systems is presented in Section 3. In Section 4, we describe our linear-weighted hybrid and the components from which it is formed, as well as the tensor factorization technique that we are using for comparison. Our experimental results and evaluation follow in Section 5. Finally, we conclude the paper with a discussion of our results 2. Background and definitions The foundation of a social annotation system is the annotation: the record of a user labeling a resource with one ore tags. A collection of annotations results in a complex network of interrelated users, resources and tags [3]. a social annotation system can be described as a four-tuple: D=(U, R, T, A), where, U is a set of users: R is a set of resources: T is a set of tags: and A is a set of annotations. Each annotation a is a tuple (u, r, Tur), where u is a user, r is a resource, and Tur is the set of tags that u has applied to r. See Fig 3. It is sometimes useful to view a social annotation system as a three-dimensional matrix, URT, in which an entry URT(u, r, t) is 1 if u has tagged r with t We define resource recommendation as the production of an ordered list of resources likely to be of interest to a particular user. To make our definition as general as possible, we include the possibility that a user may wish to impose ome requirements on such recommendations. As noted above, these requirements often naturally arise from the mode of user interaction with the system. For example, the basic personalized recommendation task usually involves a learned user profile and no additional requirements. On the other hand, in the context of navigating the system or query-based retrieval, requirements such as the tags or resources selected by the user during the current session would impose additional constraints on the set of recommended resources. For our purposes, we will assume that the ordering of recommended esources is generated by sorting the set of resources, after the system has estimated their desirability by producing real-numbered priority value for each With these considerations in mind, we can view any resource recommendation algorithm as a function φ:U×QxR→R which operates on a user u E U, a set of requirements qE Q=P(U R), and a resource re R, and produces a real- valued result p, which is the priority value of r for u: (u, g, r)=p As noted in [17, user requirements in recommendation can take a variety of forms. In general, we assume that the set equirements q can be comprised of tags, resources, or even users, though in the most typical situations, q is a singleton g or resource selected during navigation. The most basic case of resource recommendation, the one shown in Fig. 2, is the case where no requirements are imposed and the recommender must find resources based only on the identity of the user: p(u, 0, r) In this article, we will refer to this as basic resource recommendation An important special case of resource recommendation is one in which the user supplies a requirement in the form of a tag. This scenario has perhaps received the most research attention, as it arises naturally in the navigation of social annotation systems, as a user selects a tag to see what resources are related to it. In this type of browsing, users go back and forth between tags and the resources to which they have been applied, exploring both spaces in tandem ecency with no attempt at personalization. However, most authors assert personalization as a requirement for a recommender system, so nality is not, strictly speaking. a form of recommendation. Please cite this article in press as: Gemmell et al urce recommendation in social annotation systems: A linear-weighted hybrid approach, J Comput. System Sci. (2011). doi: 10.1016/ jcss. 2011.10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.3 (1-15) J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• 3 Fig. 3. Annotations. This paper has three main aims. First, we will describe, in a formal manner, the task of resource recommendation in social annotation systems, examining current approaches and contrasting it with work on tag recommendation and other similar tasks. We then introduce a linear-weighted hybrid recommendation algorithm and show how this technique serves to combine multiple complementary components into a single integrated model that provides the most flexibility considering the unique characteristics across different social annotation systems. Finally, we show the performance of the algorithm on six large real-world datasets, and on two different variants of the resource recommendation task. Our results show that the hybrid performs better than the individual components alone, and better than the leading factorization algorithm used in tag recommendation. Moreover the hybrid has advantages over other integrative models in that it is simple, flexible and extensible. The rest of the paper is organized as follows. In Section 2 we formally define the notion of resource recommendation. Related work on recommendation techniques in social annotation systems is presented in Section 3. In Section 4, we describe our linear-weighted hybrid and the components from which it is formed, as well as the tensor factorization technique that we are using for comparison. Our experimental results and evaluation follow in Section 5. Finally, we conclude the paper with a discussion of our results. 2. Background and definitions The foundation of a social annotation system is the annotation: the record of a user labeling a resource with one or more tags. A collection of annotations results in a complex network of interrelated users, resources and tags [3]. A social annotation system can be described as a four-tuple: D = U, R, T , A, where, U is a set of users; R is a set of resources; T is a set of tags; and A is a set of annotations. Each annotation a is a tuple u,r, Tur, where u is a user, r is a resource, and Tur is the set of tags that u has applied to r. See Fig. 3. It is sometimes useful to view a social annotation system as a three-dimensional matrix, URT, in which an entry URT(u,r,t) is 1 if u has tagged r with t. We define resource recommendation as the production of an ordered list of resources likely to be of interest to a particular user. To make our definition as general as possible, we include the possibility that a user may wish to impose some requirements on such recommendations. As noted above, these requirements often naturally arise from the mode of user interaction with the system. For example, the basic personalized recommendation task usually involves a learned user profile and no additional requirements. On the other hand, in the context of navigating the system or query-based retrieval, requirements such as the tags or resources selected by the user during the current session would impose additional constraints on the set of recommended resources. For our purposes, we will assume that the ordering of recommended resources is generated by sorting the set of resources, after the system has estimated their desirability by producing a real-numbered priority value for each. With these considerations in mind, we can view any resource recommendation algorithm as a function φ : U × Q × R → R which operates on a user u ∈ U, a set of requirements q ∈ Q = P(U ∪ T ∪ R), and a resource r ∈ R, and produces a realvalued result p, which is the priority value of r for u: φ(u, q,r) = p. As noted in [17], user requirements in recommendation can take a variety of forms. In general, we assume that the set of requirements q can be comprised of tags, resources, or even users, though in the most typical situations, q is a singleton tag or resource selected during navigation. The most basic case of resource recommendation, the one shown in Fig. 2, is the case where no requirements are imposed and the recommender must find resources based only on the identity of the user: φ(u, Ø,r). In this article, we will refer to this as basic resource recommendation. 7 An important special case of resource recommendation is one in which the user supplies a requirement in the form of a tag. This scenario has perhaps received the most research attention, as it arises naturally in the navigation of social annotation systems, as a user selects a tag to see what resources are related to it. In this type of browsing, users go back and forth between tags and the resources to which they have been applied, exploring both spaces in tandem. 7 Non-personalized “recommendation” functions also exist, for example, Delicious’s “Fresh Bookmarks” on its homepage, which are sorted only by popularity and recency with no attempt at personalization. However, most authors assert personalization as a requirement for a recommender system, so such a functionality is not, strictly speaking, a form of recommendation.
ARTICLE IN PRESS YJCSS: 253 J Gemmell et al/ Journal of Computer and System Sciences However, the user can easily run into the problems of ambiguity and redundancy identified above. For example, the car fancier does not want to see pages about wildlife reserves when she clicks on the tag "jaguar " If we treat the response to tag selection as a recommendation task, then the system can make use of the user's profile to identify resources relevant to the tag as the user sees it, and ignore other interpretations. a tag-specific resource recommendation algorithm has the form p(u, It, r), where q=It) is a set containing a single tag. Resource recommendation in social annotation systems has yet to be studied in a systematic manner. Some authors assume the basic form of resource recommendation. Others assume the tag-specific variant or other forms Often algorithms designed for one constraint are not compatible to others. Adding to this confusion is the fact that algorithms which perform well for other tasks perform poorly when applied to resource recommendation. For example, several state-of-the-art tag recommendation algorithms perform poorly when applied to variations of the resource recommendation problem Efforts in designing recommenders for social annotation systems often focus around adapting more conventional forms of recommendation like those found in e-commerce systems in which the users preferences are gathered in the form of ratings. We can think of a rating-based recommender system as operating on a two-dimensional matrix, defined by users as one dimension and items as the other. Each entry in the matrix is a rating that a user has supplied for some item. a recommender system that can accurately extrapolate missing values in the matrix can generate recommended items for users to examine. In a social annotation system, there is a third dimension that of the tag. The data becomes three- dimensional, and the task of resource recommendation is that of trying to estimate what resources the user might wish to Research in ratings-based recommendation is well established and has resulted in some well-known algorithms Instance- ased collaborative filtering makes predictions based on the similarity between users [18-20] or the similarity between resources [21, 22]. Model-based collaborative filtering instead constructs a model from the data which is then queried di- rectly for recommendations. Well-known examples include graph models, matrix factorization and probabilistic models. Hybrid models have been used to combine these recommenders in various forms[23 exploiting the strength of many recommenders rather than relying on one Instance-based collaborative filtering has been modified to resource recommendation in social annotation systems extending the ratings matrix to include tag information [24, although most efforts do not assume access to ratings data. User-based collaborative filtering has also been adapted for recommending tags. Users can be modeled as a vector of re sources, a vector of tags, a combinations of the two, or feature vectors such as those calculated through singular value decomposition [25]. Item-based collaborative filtering has also been adapted for tag-recommendation [12, 13 In this work we extend these instance-based methods to resource recommendation. An advantage of these approaches is their flexibility hey are applicable to both the basic and tag-specific case of resource recommendation. Starting from the well-known PageRank algorithm [26]. researchers have derived Adapted PageRank and FolkRank [27.71 for tag recommendation. These algorithms demonstrated the importance of an integrative framework in social annotation systems: users, resources and tags were treated as nodes and were connected based on their occurrence together in generated annotations. The computational requirements of this approach however is daunting, requiring a recalculation of he PageRank vector for each query Matrix factorization approaches that have been found successful in e-commerce recommendation depend on the two- dimensional structure of the ratings matrix, in which users and resources form the axes and the values of the matrix are known ratings. These results cannot be straightforwardly extended to the three-dimensional structure of a social annotation system in which we have users, resources and tags as the three dimensions and binary values in each cell Researchers hav begun exploring tensor factorization to reduce the dimensionality of the social data. Tucker decomposition is one approach, factoring the three-dimensional tagging data into three feature spaces and a core residual tensor 10 Given a user-resource-tag triple, the model generates a likelihood score. Consequently it is possible to build a recommender that predicts one dimension of the data(i.e resources) given elements from the other two dimension (ie a user and a tag). a drawback of this model is its inflexibility; it cannot make recommendations given other constraints. Also, the model-building phase is highly computationally-intensive a pair-wise interaction tensor factorization model has also been proposed. It offers far more reasonable run times in both he construction of the model and the generation of recommendations [8, 9]. This algorithm takes an optimization approach, trying to compute the best ranking of tags given the known user-resource pairs in the data. The model is then used to recommend tags for new user-resource pairs. In this work, we invert this method constructing an ordered list of resources tag recommendation. However, due to the nature of the algorithm (it requires a user and a tag) it is not applicable to all ce recommendation tasks Probability models have been applied to social annotation systems for both tag recommendation 28 and ecommendation 29. Users, resources and tags were tied together via a single concept space. Probabilistic latent analysis has been used for resource discovery using a separate concept space for the user-tag relation and the tag relation [30]. Probabilistic models that also incorporate resource-resource affinities have been presented [31 Please cite this article in press as: Gemmell et aL, Resource recommendation in social annotation systems: A linear-weighted hybrid approach, J. Comput. System Sci. (2011). doi: 10.1016/j jcSs 2011. 10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.4 (1-15) 4 J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• However, the user can easily run into the problems of ambiguity and redundancy identified above. For example, the car fancier does not want to see pages about wildlife reserves when she clicks on the tag “jaguar.” If we treat the response to tag selection as a recommendation task, then the system can make use of the user’s profile to identify resources relevant to the tag as the user sees it, and ignore other interpretations. A tag-specific resource recommendation algorithm has the form φ(u,{t},r), where q = {t} is a set containing a single tag. 3. Related work Resource recommendation in social annotation systems has yet to be studied in a systematic manner. Some authors assume the basic form of resource recommendation. Others assume the tag-specific variant or other forms. Often algorithms designed for one constraint are not compatible to others. Adding to this confusion is the fact that algorithms which perform well for other tasks perform poorly when applied to resource recommendation. For example, several state-of-the-art tag recommendation algorithms perform poorly when applied to variations of the resource recommendation problem. Efforts in designing recommenders for social annotation systems often focus around adapting more conventional forms of recommendation like those found in e-commerce systems in which the user’s preferences are gathered in the form of ratings. We can think of a rating-based recommender system as operating on a two-dimensional matrix, defined by users as one dimension and items as the other. Each entry in the matrix is a rating that a user has supplied for some item. A recommender system that can accurately extrapolate missing values in the matrix can generate recommended items for users to examine. In a social annotation system, there is a third dimension, that of the tag. The data becomes threedimensional, and the task of resource recommendation is that of trying to estimate what resources the user might wish to tag. Research in ratings-based recommendation is well established and has resulted in some well-known algorithms. Instancebased collaborative filtering makes predictions based on the similarity between users [18–20] or the similarity between resources [21,22]. Model-based collaborative filtering instead constructs a model from the data which is then queried directly for recommendations. Well-known examples include graph models, matrix factorization and probabilistic models. Hybrid models have been used to combine these recommenders in various forms [23] exploiting the strength of many recommenders rather than relying on one. Instance-based collaborative filtering has been modified to resource recommendation in social annotation systems by extending the ratings matrix to include tag information [24], although most efforts do not assume access to ratings data. User-based collaborative filtering has also been adapted for recommending tags. Users can be modeled as a vector of resources, a vector of tags, a combinations of the two, or feature vectors such as those calculated through singular value decomposition [25]. Item-based collaborative filtering has also been adapted for tag-recommendation [12,13]. In this work we extend these instance-based methods to resource recommendation. An advantage of these approaches is their flexibility; they are applicable to both the basic and tag-specific case of resource recommendation. Starting from the well-known PageRank algorithm [26], researchers have derived Adapted PageRank and FolkRank [27,7] for tag recommendation. These algorithms demonstrated the importance of an integrative framework in social annotation systems: users, resources and tags were treated as nodes and were connected based on their occurrence together in usergenerated annotations. The computational requirements of this approach however is daunting, requiring a recalculation of the PageRank vector for each query. Matrix factorization approaches that have been found successful in e-commerce recommendation depend on the twodimensional structure of the ratings matrix, in which users and resources form the axes and the values of the matrix are known ratings. These results cannot be straightforwardly extended to the three-dimensional structure of a social annotation system in which we have users, resources and tags as the three dimensions and binary values in each cell. Researchers have begun exploring tensor factorization to reduce the dimensionality of the social data. Tucker decomposition is one approach, factoring the three-dimensional tagging data into three feature spaces and a core residual tensor [10]. Given a user–resource–tag triple, the model generates a likelihood score. Consequently it is possible to build a recommender that predicts one dimension of the data (i.e. resources) given elements from the other two dimensions (i.e. a user and a tag). A drawback of this model is its inflexibility; it cannot make recommendations given other constraints. Also, the model-building phase is highly computationally-intensive. A pair-wise interaction tensor factorization model has also been proposed. It offers far more reasonable run times in both the construction of the model and the generation of recommendations [8,9]. This algorithm takes an optimization approach, trying to compute the best ranking of tags given the known user–resource pairs in the data. The model is then used to recommend tags for new user–resource pairs. In this work, we invert this method constructing an ordered list of resources for a given user–tag pair. Given its effectiveness, this technique is considered one of the state-of-the-art approaches for tag recommendation. However, due to the nature of the algorithm (it requires a user and a tag) it is not applicable to all resource recommendation tasks. Probability models have been applied to social annotation systems for both tag recommendation [28] and resource recommendation [29]. Users, resources and tags were tied together via a single concept space. Probabilistic latent semantic analysis has been used for resource discovery using a separate concept space for the user–tag relation and the resource– tag relation [30]. Probabilistic models that also incorporate resource–resource affinities have been presented [31] and the
ARTICLE IN PRESS YJCSS: 2537 /. Gemmell et aL/ Journal of Computer and System Sciences·命·(申·)·一· connection between users have been defined in a probabilistic manner in order to predict which communities they might join [32]. lusters of tags can represent topic areas [33. These clusters have been used as intermediaries between users ources allowing the recommendation of resources 34, 14, 15]. Such recommenders can accommodate both the ba g-specific constraints of resource recommendation. It is also possible to cluster resources generating topic-specific parti- tions [35 perhaps identifying resources based on other constraints. Clustering tags is also useful for overcoming the problem of redundancy [6. Another effort in combating redundancy as well as ambiguity constructed user-centric tag models [36 Hybrid models have been used to generate integrative models by combining several component recommenders like hose above into a larger framework. Pairs of recommenders have been combined to produce improve results in Citeu- like [37] and Bibsonomy [ 38]. Another approach demonstrated that a graph-based model may be improved by incorporating item-based collaborative filtering [12]. Hybrid models composed of both user-based and item-based collaborative filtering algorithms were shown to outperformed the state-of-the-art pair-wise interaction tensor factorization model in tag recom- mendation[13 Another work predicts user ratings in MovieLens, one of the few systems that contains both ratings and gs 39]. We build on these efforts proposing a flexible and easily extensible hybrid model for resource recommendation. Our definition of resource recommendation above centers on the function which assigns a real-valued score to each resource describing the relevance of the resource to the user(and, if supplied the requirements ) In this section, we describe he linear-weighted hybrid algorithm that we propose for this task, the components from which the hybrid is constructed and we will also describe the implementation of our comparative benchmark, an integrative approach based on tensor factorization A linear-weighted hybrid is composed of recor n components kI through Kk, whose output is combi omputing a weighted sum [23]. without loss of ge tion of the function i(u, g, r), producing output in that the a-values sum to 1. The hybrid is therefore 中(u,q,n)=∑a(u,q,n) (1) A linear-weighted hybrid of this style has a number of advantages for recommendation in social annotation domains. One is that it is inherently an extensible framework Components are free to specialize in particular dimensions of the data. If a particular application has access to additional types of data(for example, text in a document recommender)appropriate recommendation components can be added to the mix. The linear-weighted hybrid offers a way to construct integrative algorithms that take all dimensions of a social annotation system into account without requiring mathematically-complex and computationally-intensive dimensionality-reduction techniques, which are less extensible and flexible. Given a set of recommendation components, it remains to ascertain the correct ai for each component in order to maximize the effectiveness of the hybrid. We use a hill climbing technique which is both simple and efficient. A subset of the data is selected as a holdout set for learning the algorithm parameters, including the a values (See Methodology description in Section 5.2 below. The a vector is initialized with random positive numbers constrained such that the sun of the vector equals 1. The recommender then operates over the holdout set, using the remaining data as training data. the precision of recommendations is calculated as described in Section 5.2. The vector is then randomly modified and tested gain. If the accuracy is improved, the change is accepted; otherwise it is rejected. Occasionally a change to the a vector is accepted even when it does not improve the results in order to more fully explore the a space. Modifications continue until the vector stabilizes. Then the a vector is randomly reset and learning proceeds Now we turn to the nents that make up our hybrid. Many of these components rely on two-dimensional projec he URT matrix [40]. Such projections reduce the dimensionality of the data, but sacrifice some of its informational at. For example, the relation between resources and tags can be defined as rt(r, t), the number of users that have URT(u, r, t) This notion strongly resembles the "bag-of-words"vector space model [41 ] Note however that such a projection models the relationship between resources and tags in a non-personalized way. Similarly, we can produce a projection UT in which a user is modeled as a vector over the set of tags, where each weight, ur(u, t), measures how often a user applied a particular tag across all resources. In all, there are six possible two-dimensional projections: UR, UT, RU, RT, TU, TR For source recommendation, we have found four to be useful: UR, UT. RU, and RT. In the case of ur, we have not found it useful to weight resources by the number of tags a user applies, as this is not always a reliable measure of user interest. Rather we define UR to be binary, indicating whether or not the user has annotated the resource. Comput. System Sci. (2011) doi:).J.Gemmell et al Please cite this article in press urce recommendation in social annotation systems: A linear-weighted hybrid approach, J 1016jcss,201110.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.5 (1-15) J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• 5 connection between users have been defined in a probabilistic manner in order to predict which communities they might join [32]. Clusters of tags can represent topic areas [33]. These clusters have been used as intermediaries between users and resources allowing the recommendation of resources [34,14,15]. Such recommenders can accommodate both the basic and tag-specific constraints of resource recommendation. It is also possible to cluster resources generating topic-specific partitions [35] perhaps identifying resources based on other constraints. Clustering tags is also useful for overcoming the problem of redundancy [6]. Another effort in combating redundancy as well as ambiguity constructed user-centric tag models [36]. Hybrid models have been used to generate integrative models by combining several component recommenders like those above into a larger framework. Pairs of recommenders have been combined to produce improve results in Citeulike [37] and Bibsonomy [38]. Another approach demonstrated that a graph-based model may be improved by incorporating item-based collaborative filtering [12]. Hybrid models composed of both user-based and item-based collaborative filtering algorithms were shown to outperformed the state-of-the-art pair-wise interaction tensor factorization model in tag recommendation [13]. Another work predicts user ratings in MovieLens, one of the few systems that contains both ratings and tags [39]. We build on these efforts proposing a flexible and easily extensible hybrid model for resource recommendation. 4. Resource recommendation algorithms Our definition of resource recommendation above centers on the function φ, which assigns a real-valued score to each resource describing the relevance of the resource to the user (and, if supplied, the requirements). In this section, we describe the linear-weighted hybrid algorithm that we propose for this task, the components from which the hybrid is constructed, and we will also describe the implementation of our comparative benchmark, an integrative approach based on tensor factorization. 4.1. Linear-weighted hybrid A linear-weighted hybrid is composed of recommendation components κ1 through κk, whose output is combined by computing a weighted sum [23]. Without loss of generality, we will assume that each component κi has its own computation of the function φi(u, q,r), producing output in the range [0..1], and a weight αi in the same range. We further require that the α-values sum to 1. The hybrid is therefore defined as φ(u, q,r) = k i=1 αiφi(u, q,r) (1) A linear-weighted hybrid of this style has a number of advantages for recommendation in social annotation domains. One is that it is inherently an extensible framework. Components are free to specialize in particular dimensions of the data. If a particular application has access to additional types of data (for example, text in a document recommender) appropriate recommendation components can be added to the mix. The linear-weighted hybrid offers a way to construct integrative algorithms that take all dimensions of a social annotation system into account without requiring mathematically-complex and computationally-intensive dimensionality-reduction techniques, which are less extensible and flexible. Given a set of recommendation components, it remains to ascertain the correct αi for each component in order to maximize the effectiveness of the hybrid. We use a hill climbing technique, which is both simple and efficient. A subset of the data is selected as a holdout set for learning the algorithm parameters, including the α values. (See Methodology description in Section 5.2 below.) The α vector is initialized with random positive numbers constrained such that the sum of the vector equals 1. The recommender then operates over the holdout set, using the remaining data as training data. The precision of recommendations is calculated as described in Section 5.2. The vector is then randomly modified and tested again. If the accuracy is improved, the change is accepted; otherwise it is rejected. Occasionally a change to the α vector is accepted even when it does not improve the results in order to more fully explore the α space. Modifications continue until the vector stabilizes. Then the α vector is randomly reset and learning proceeds again. Now we turn to the components that make up our hybrid. Many of these components rely on two-dimensional projections of the URT matrix [40]. Such projections reduce the dimensionality of the data, but sacrifice some of its informational content. For example, the relation between resources and tags can be defined as RT(r,t), the number of users that have applied t to r. RT(r,t) = ∀u∈U URT(u,r,t) (2) This notion strongly resembles the “bag-of-words” vector space model [41]. Note however that such a projection models the relationship between resources and tags in a non-personalized way. Similarly, we can produce a projection UT in which a user is modeled as a vector over the set of tags, where each weight, UT(u,t), measures how often a user applied a particular tag across all resources. In all, there are six possible two-dimensional projections: UR, UT, RU, RT, TU, TR. For resource recommendation, we have found four to be useful: UR, UT, RU, and RT. In the case of UR, we have not found it useful to weight resources by the number of tags a user applies, as this is not always a reliable measure of user interest. Rather we define UR to be binary, indicating whether or not the user has annotated the resource.
ARTICLE IN PRESS YJCSS: 253 J Gemmell et al/ Journal of Computer and System Sciences 4.1.1. User-based collaborative filtering The first two recommendation com ts are based on the well-known knn user-based collaborative filter [18-20 The central premise of this approach is that users who have agreed in the past are likely to agree in th In the case of the basic resource recommendation, to calculate the value for (u, 0, r), we start by ident users similar to u using cosine similarity. This is the set of k neighbors Nu. Then, we use these neighbors scores for r to extrapolate a personalized score for u as follows p(u,0,r)=a(u,v)(v,r) where o(u, v) is the cosine similarity between the users u and v and e (v, r) is 1 if v has annotated r and 0 otherwise When we use the UR projection, users are modeled in terms of the resources that they have tagged. We will refer to this algorithm as KNNur On the other hand, when users are modeled as tags we call the algorithm KNNut For tag-specific resource recommendation, we make a similar computation, with the additional constraint that neighbors must have used the tag t specified as part of the recommendation requirements. We denote this neighborhood by Ni ly consider those resources which have been annotated by the neighbors with the tag (u,(,r)=∑(u,v)x(,tr) (4) v∈N where x(v, t, r) is 1 if user v labeled resource r with tag t; otherwise, 0. 4.1.2. Item-based collaborative filtering Item-based collaborative filtering [21, 22 relies on discovering similarities among resources rather than among users Accordingly, the relevant projections are RT and RU, giving rise to two recommendation components KNNru and KNNrt 8. In basic resource recommendation, given r we define Nr as the k resources nearest to r, drawn from the user profile and then define the relevance of r for the user as 中(u,0,)=∑(,(u,s) Again we rely on cosine similarity. If a user as annotated resources similar to r then (u, o r) will be high The computation becomes slightly simpler for tag-specific resource recommendation. A resource is considered only if me user has tagged it with t Neighbor resources are chosen only if they have also been tagged with t: φ(u,{tl,r)=∑a(r,s) similarity iven that we may define both users and resources as a vector over the tag space we may directly measure the similarity between the two elements. We call this model TSut and, adapting cosine similarity, define its measure for basic resource ∑erRr(r,t)×UT(u,t) φ(u,D,r)= Rt(r, t)2x This method works under the assumption that the frequency with which a user employs a tag measures his interest in he topic described by that tag. We assume that the frequency of the tags applied to the resource adequately describe the resource. If these two models are similar, we can infer a relationship between the user and resource. For tag-specific recommendation, the formula simplifies to a single term in the numerator as we only consider the given tag when comparing the user and the resource. We call this approach TSrt 4.1.4. Popularity model inally, the simplest component recommender is one which merely recommends the most popular resources. We call his approach Pop and define its measure for basic resource recommendation as p(u,0,n)=∑(v,n) trictly speaking, Pop is not a"recommendation algorithm"as it is not personalized and it merely serves to give weight to the items that are already most highly represented in the system. However, across all of our evaluations, the hybrid performed better when this component was included. Please cite this article in press as: J. Gemmell et aL., Resource recommendation in social annotation systems: A linear-weighted hybrid approach, J. Comput. System Sci. (2011). doi: 10.1016/j jcSs 2011. 10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.6 (1-15) 6 J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• 4.1.1. User-based collaborative filtering The first two recommendation components are based on the well-known KNN user-based collaborative filtering algorithm [18–20]. The central premise of this approach is that users who have agreed in the past are likely to agree in the future. In the case of the basic resource recommendation, to calculate the value for φ(u, Ø,r), we start by identifying peer users similar to u using cosine similarity. This is the set of k neighbors Nu. Then, we use these neighbor’s scores for r to extrapolate a personalized score for u as follows. φ(u, Ø,r) = v∈Nu σ(u, v)θ (v,r) (3) where σ(u, v) is the cosine similarity between the users u and v and θ (v,r) is 1 if v has annotated r and 0 otherwise. When we use the UR projection, users are modeled in terms of the resources that they have tagged. We will refer to this algorithm as KNNur. On the other hand, when users are modeled as tags we call the algorithm KNNut. For tag-specific resource recommendation, we make a similar computation, with the additional constraint that neighbors must have used the tag t specified as part of the recommendation requirements. We denote this neighborhood by Nt u. Furthermore we only consider those resources which have been annotated by the neighbors with the tag t: φ u,{t},r = v∈Nt u σ(u, v)χ(v,t,r) (4) where χ(v,t,r) is 1 if user v labeled resource r with tag t; otherwise, 0. 4.1.2. Item-based collaborative filtering Item-based collaborative filtering [21,22] relies on discovering similarities among resources rather than among users. Accordingly, the relevant projections are RT and RU, giving rise to two recommendation components KNNru and KNNrt. In basic resource recommendation, given r we define Nr as the k resources nearest to r, drawn from the user profile and then define the relevance of r for the user as φ(u, Ø,r) = s∈Nr σ(r, s)θ (u, s) (5) Again we rely on cosine similarity. If a user as annotated resources similar to r then φ(u, Ø,r) will be high. The computation becomes slightly simpler for tag-specific resource recommendation. A resource is considered only if some user has tagged it with t. Neighbor resources are chosen only if they have also been tagged with t: φ u,{t},r = s∈Nt r σ(r, s) (6) 4.1.3. Tag model similarity Given that we may define both users and resources as a vector over the tag space, we may directly measure the similarity between the two elements. We call this model TSut and, adapting cosine similarity, define its measure for basic resource recommendation as φ(u, Ø,r) = t∈T RT(r,t) × UT(u,t) t∈T RT(r,t)2 × t∈T UT(u,t)2 (7) This method works under the assumption that the frequency with which a user employs a tag measures his interest in the topic described by that tag. We assume that the frequency of the tags applied to the resource adequately describe the resource. If these two models are similar, we can infer a relationship between the user and resource. For tag-specific recommendation, the formula simplifies to a single term in the numerator as we only consider the given tag when comparing the user and the resource. We call this approach TStt. 4.1.4. Popularity model Finally, the simplest component recommender is one which merely recommends the most popular resources. We call this approach Pop and define its measure for basic resource recommendation as φ(u, Ø,r) = v∈U θ (v,r) (8) Strictly speaking, Pop is not a “recommendation algorithm” as it is not personalized and it merely serves to give weight to the items that are already most highly represented in the system. However, across all of our evaluations, the hybrid performed better when this component was included.
ARTICLE IN PRESS YJCSS: 2537 /. Gemmell et aL/ Journal of Computer and System Sciences·命·(申·)·一· For tag-specific resource recommendation, the score is modified to incorporate the tag (ut,)=∑x(vt,n) here x(v, t, r) is 1 if v has annotated r with t and 0 otherwise. We call this approach PopTag. 4.2. Pair-wise interaction tensor factorization The linear-weighted hybrid represents one avenue for creating a recommender that incorporates all of the dimensions in a social annotation system. Another approach is dimensionality reduction as exemplified by tensor factorization. basis for comparison with our algorithm, we chose the pair-wise interaction tensor factorization [9 which was developed for the task of tag recommendation, and is considered among the start-of-the-art tag recommenders. Our adaptation of the PItF model to resource recommendation simply exchanges the roles of resources and tags with respect to each other. This model-based approach generates a set of factor matrices which resembles a special case of the tucker decomposition of a tensor. The tensor itself is not directly induced by the data( this could be achieved by regarding each(u, r, t) triple as a binary cell of a tensor), but rather reflects a ranking over the resources for each user-tag pair. Thus, it is important to note that this model, used directly, can only provide a point of comparison in the special case of tag-specific resource ecommendation but not for the basic resource recommendation task. The model is built by first considering observations in the data of the form(u, r+, r-, t), where(u, r+, t) is a triple which is found in the data(a positive example of resource selection) and(u, r-, t) is a triple not found in the data(a negative example of resource selection). An iterative gradient-descent algorithm is employed to optimize a ranking function(based Bayesian conditionals) that prefers positive examples in the data over negative ones. Each of four related matrices is updated until convergence is found. The matrices represent the factor-reduced components of the specialized tensor factorization M=UKRI +TkRi, where Uk is the user factor matrix, Tk is the tag factor matrix, re is the resource factor natrix with respect to users and Rk is the resource factor matrix with respect to tags, k is the selected number of factors, nd M is the personalized resource-ranking tensor Generating a resource recommendation for a given user u and tag t is simply a matter of referring to the appropriate user-tag column of the ranking tensor M. The relevance score of a resource given a user-tag pair is calculated p(u,,)=∑Ulul+TiRr 10) In this section we describe the methods used to gather and pre-process our six datasets and our evaluation metrics and ethodology. Then we examine the results for each recommendation task, and finally draw some general conclusions. 51. Datasets Our experiments were conducted using data from six large real-world social annotation systems. On all datasets we generate p-cores [27]. Users, resources and tags were removed in order to produce a residual dataset that guarantees each user, resource and tag occur in at least p annotations. reral reasons exist to construct p-cores. By eliminating infrequent items, the size of the data is dramatically reduced, allowing the application of recommendation techniques that would otherwise be computationally impractical. By removing ccurring users, resource or tags, noise in the data can be dramatically reduced. Because of their scarcity, these items ones most likely to confound recommenders. So, in this work, we are ignoring these rare items, the contents of the d"long tail"of the data distribution. Recommendation scenarios that involve new users, rarely-rated items and or idiosyncratic tags are without doubt important, but they lie outside the scope of this paper. When possible we constructed 20-cores from the datasets. If the dataset was not large enough to render a 20-core, we instead constructed a 5-core, so hat we would have enough data on each user to create five partitions Bibsonomy enables its users to annotate both URl bookmarks and journal articles. The dataset was gathered on 1 January 2009 encompassing the entire system. This data set has been made available online by the system administrators [42], who have pre-processed the data to remove anomalies. a 5-core was taken. It contains 13, 909 annotations with 357 users, 1738 resources and 1573 tags Citeulike is a popular online tool used by researchers to manage and catalog journal articles. The site owners make their dataset freely available to download We use a snapshot taken as of 17 February 2009. Once a 5-core was computed, the emaining dataset contains 2051 users, 5376 resources, 3343 tags and 105, 873 annotations. MovieLens is a data set gathered from the corresponding MovieLens Web site and is administered by the grouplens research lab at the University of Minnesota. It contains users, rating of movies, and tags. A 5-core was generated from the data resulting in 35, 366 annotations with 819 users, 2445 resources and 2309 tags Comput. System Sci.(2011), doj?:.Gemmell et al Please cite this article in press urce recommendation in social annotation systems: A linear-weighted hybrid approach, J 1016jcss,201110.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.7 (1-15) J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• 7 For tag-specific resource recommendation, the score is modified to incorporate the tag: φ u,{t},r = v∈U χ(v,t,r) (9) where χ(v,t,r) is 1 if v has annotated r with t and 0 otherwise. We call this approach PopTag. 4.2. Pair-wise interaction tensor factorization The linear-weighted hybrid represents one avenue for creating a recommender that incorporates all of the dimensions in a social annotation system. Another approach is dimensionality reduction as exemplified by tensor factorization. As a basis for comparison with our algorithm, we chose the pair-wise interaction tensor factorization [9], which was developed for the task of tag recommendation, and is considered among the start-of-the-art tag recommenders. Our adaptation of the PITF model to resource recommendation simply exchanges the roles of resources and tags with respect to each other. This model-based approach generates a set of factor matrices which resembles a special case of the Tucker decomposition of a tensor. The tensor itself is not directly induced by the data (this could be achieved by regarding each (u,r,t) triple as a binary cell of a tensor), but rather reflects a ranking over the resources for each user–tag pair. Thus, it is important to note that this model, used directly, can only provide a point of comparison in the special case of tag-specific resource recommendation, but not for the basic resource recommendation task. The model is built by first considering observations in the data of the form (u,r+,r−,t), where (u,r+,t) is a triple which is found in the data (a positive example of resource selection) and (u,r−,t) is a triple not found in the data (a negative example of resource selection). An iterative gradient-descent algorithm is employed to optimize a ranking function (based on Bayesian conditionals) that prefers positive examples in the data over negative ones. Each of four related matrices is updated until convergence is found. The matrices represent the factor-reduced components of the specialized tensor factorization M = Uk RU k + Tk RT k , where Uk is the user factor matrix, Tk is the tag factor matrix, RU k is the resource factor matrix with respect to users and RT k is the resource factor matrix with respect to tags, k is the selected number of factors, and M is the personalized resource-ranking tensor. Generating a resource recommendation for a given user u and tag t is simply a matter of referring to the appropriate user–tag column of the ranking tensor M. The relevance score of a resource given a user–tag pair is calculated as φ u,{t},r = k i=1 Uk[u][i]RU k [r][i] + Tk[t][i]RT k [r][i] (10) 5. Experimental evaluation In this section we describe the methods used to gather and pre-process our six datasets and our evaluation metrics and methodology. Then we examine the results for each recommendation task, and finally draw some general conclusions. 5.1. Datasets Our experiments were conducted using data from six large real-world social annotation systems. On all datasets we generate p-cores [27]. Users, resources and tags were removed in order to produce a residual dataset that guarantees each user, resource and tag occur in at least p annotations. Several reasons exist to construct p-cores. By eliminating infrequent items, the size of the data is dramatically reduced, allowing the application of recommendation techniques that would otherwise be computationally impractical. By removing rarely-occurring users, resource or tags, noise in the data can be dramatically reduced. Because of their scarcity, these items are the ones most likely to confound recommenders. So, in this work, we are ignoring these rare items, the contents of the so-called “long tail” of the data distribution. Recommendation scenarios that involve new users, rarely-rated items and/or idiosyncratic tags are without doubt important, but they lie outside the scope of this paper. When possible we constructed 20-cores from the datasets. If the dataset was not large enough to render a 20-core, we instead constructed a 5-core, so that we would have enough data on each user to create five partitions. Bibsonomy enables its users to annotate both URL bookmarks and journal articles. The dataset was gathered on 1 January 2009 encompassing the entire system. This data set has been made available online by the system administrators [42], who have pre-processed the data to remove anomalies. A 5-core was taken. It contains 13,909 annotations with 357 users, 1738 resources and 1573 tags. Citeulike is a popular online tool used by researchers to manage and catalog journal articles. The site owners make their dataset freely available to download. We use a snapshot taken as of 17 February 2009. Once a 5-core was computed, the remaining dataset contains 2051 users, 5376 resources, 3343 tags and 105,873 annotations. MovieLens is a data set gathered from the corresponding MovieLens Web site and is administered by the GroupLens research lab at the University of Minnesota. It contains users, rating of movies, and tags. A 5-core was generated from the data resulting in 35,366 annotations with 819 users, 2445 resources and 2309 tags.
ARTICLE IN PRESS YJCSS: 253 J Gemmell et al/ Journal of Computer and System Sciences 三 Paramete Fig 4. Experimental methodology Delicious is a popular Web site in which users annotate URLs. On 19 October 2008, 198 of the most popular tags were taken from the user interface and the site was recursively explored. From 20 October to 15 December, the complete profiles of 524, 790 users were collected. Due to memory and time constraints, 10% of the user profiles was randomly selected and a 20-core taken for experiments. The dataset is our largest, containing 7665 users, 15, 612 resource and 5746 tags. It contains 720 788 annotations. Amazon is Americas largest online retailer. The site includes a myriad of ways for users to express and discover opinions of the products: ratings, editorial reviews, customer reviews, product details, and customer purchasing habits. Recently, Amazon has added social annotations to this list. Beginning on 1 July 2009 we recursively explored the site to gather 1.5 million user profiles. Many users had extremely small profiles or used idiosyncratic tags. After taking a 20-core of the data it contained 498, 217 annotations with 8802 users, 10, 679 resource and 5559 tag stFM users upload their music profiles, create playlists and share their musical tastes online. We selected 100 random users from the system and recursively explored the"friend"network. Only about 20% of the users had annotated a resource. Users have the option to tag songs, artists or albums, but the tagging data here is limited to album annotations. a p-core of 20 was drawn from the data. It contains 2368 users, 2350 resources, 1141 tags and 172, 177 annotations. 5.2. Methodology For each data set, we started with a complete collection of annotations A Training and test sets were created by dividing the annotations randomly into partitions Each annotation contains a user, resource and all tags applied by the user to that resource. Dividing the data by annotations limits the influence of users who apply many tags. Two phases are required for the evaluation. First, the parameters must be learned including the number of neighbors for the collaborative filtering approach, the number of features for PTF and the a values for the linear-weighted hybrid. The process is shown in Fig. 4. The annotations are divided into five equal partitions PI though Ps. The partitions were generated randomly, but the process ensured that each user is represented in each partition- since we generated 5-cores and 20-cores of the data, there is sufficient data to ensure that at least one annotation from each user appears in each partition. One partition was used for the learning of the parameters That partition was then discarded and the ation was performed on the remaining four folds. Four-fold cross-validation was performed using these remaining partitions. One partition Ph was selected as a holdout of annotations and the remaining partitions served as training data for the recommenders. To evaluate the basic resource recommendation algorithms, we iterated over all users in Ph For each user, u, we collected all resources Rh for which annotations exist in Ph. We then generated a recommendation set of size k for user u, Ru Recall is a common metric for evaluating the utility of recommendation algorithms. It measures the percentage of items in the holdout set that appear in the recommendation set. Recall is a measure of completeness and is defined as recal Rh∩Ru R Precision is another common metric for measuring the usefulness of recommendation algorithms. It measures the per- entage of items in the recommendation set that appear in the holdout set. Precision measures the exactness of the recommendation algorithm and is defined as precision E Rn∩Rul The recall and precision will vary depending on k, the size on the recommendation set. The results that follow show he tradeoff between recall and precision as k varies from 1 to 10. The results are averaged for each user, averaged over all users, and finally averaged over all four folds. To evaluate the tag-specific resource recommendation algorithms, we need to provide both a user and a tag and eval- uate the systems ability to find a resource to which the user has applied that tag. Again, we started with the holdout partition Ph In this case, we operated on one annotation at a time. Each annotation contains a user u, a resource r and collection of tags tI through tk. We select one tag at random, and generate a recommendation set using this tag and the Please cite this article in press as: Gemmell et aL, Resource recommendation in social annotation systems: A linear-weighted hybrid approach, J. Comput. System Sci. (2011). doi: 10.1016/j jcSs 2011. 10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.8 (1-15) 8 J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• Fig. 4. Experimental methodology. Delicious is a popular Web site in which users annotate URLs. On 19 October 2008, 198 of the most popular tags were taken from the user interface and the site was recursively explored. From 20 October to 15 December, the complete profiles of 524,790 users were collected. Due to memory and time constraints, 10% of the user profiles was randomly selected, and a 20-core taken for experiments. The dataset is our largest, containing 7665 users, 15,612 resource and 5746 tags. It contains 720,788 annotations. Amazon is America’s largest online retailer. The site includes a myriad of ways for users to express and discover opinions of the products: ratings, editorial reviews, customer reviews, product details, and customer purchasing habits. Recently, Amazon has added social annotations to this list. Beginning on 1 July 2009 we recursively explored the site to gather 1.5 million user profiles. Many users had extremely small profiles or used idiosyncratic tags. After taking a 20-core of the data, it contained 498,217 annotations with 8802 users, 10,679 resource and 5559 tags. LastFM users upload their music profiles, create playlists and share their musical tastes online. We selected 100 random users from the system and recursively explored the “friend” network. Only about 20% of the users had annotated a resource. Users have the option to tag songs, artists or albums, but the tagging data here is limited to album annotations. A p-core of 20 was drawn from the data. It contains 2368 users, 2350 resources, 1141 tags and 172,177 annotations. 5.2. Methodology For each data set, we started with a complete collection of annotations A. Training and test sets were created by dividing the annotations randomly into partitions. Each annotation contains a user, resource and all tags applied by the user to that resource. Dividing the data by annotations limits the influence of users who apply many tags. Two phases are required for the evaluation. First, the parameters must be learned including the number of neighbors for the collaborative filtering approach, the number of features for PITF and the α values for the linear-weighted hybrid. The process is shown in Fig. 4. The annotations are divided into five equal partitions P1 though P5. The partitions were generated randomly, but the process ensured that each user is represented in each partition – since we generated 5-cores and 20-cores of the data, there is sufficient data to ensure that at least one annotation from each user appears in each partition. One partition was used for the learning of the parameters. That partition was then discarded and the evaluation was performed on the remaining four folds. Four-fold cross-validation was performed using these remaining partitions. One partition Ph was selected as a holdout set of annotations and the remaining partitions served as training data for the recommenders. To evaluate the basic resource recommendation algorithms, we iterated over all users in Ph. For each user, u, we collected all resources Rh for which annotations exist in Ph. We then generated a recommendation set of size k for user u, Ru. Recall is a common metric for evaluating the utility of recommendation algorithms. It measures the percentage of items in the holdout set that appear in the recommendation set. Recall is a measure of completeness and is defined as recall = |Rh ∩ Ru| |Rh| (11) Precision is another common metric for measuring the usefulness of recommendation algorithms. It measures the percentage of items in the recommendation set that appear in the holdout set. Precision measures the exactness of the recommendation algorithm and is defined as precision = |Rh ∩ Ru| |Ru| (12) The recall and precision will vary depending on k, the size on the recommendation set. The results that follow show the tradeoff between recall and precision as k varies from 1 to 10. The results are averaged for each user, averaged over all users, and finally averaged over all four folds. To evaluate the tag-specific resource recommendation algorithms, we need to provide both a user and a tag and evaluate the system’s ability to find a resource to which the user has applied that tag. Again, we started with the holdout partition Ph. In this case, we operated on one annotation at a time. Each annotation contains a user u, a resource r and a collection of tags t1 through tk. We select one tag at random, and generate a recommendation set using this tag and the
ARTICLE IN PRESS YJCSS: 2537 /. Gemmell et aL/ Journal of Computer and System Sciences·命·(申·)·一· Bibsonomy KNNur(25 trKNNut(15 KNNru(25) Citeulike trKNNur(25) -KNUt(25) Fig. 5. Recall and precision plotted for basic resource recommendation sets of size one through ten for Bibsonomy and Citeulike. user. This is the set Rut. Because we only have a single resource to retrieve recall and precision are not applicable in the same way as with the previous algorithm. For this algorithm, we measure recall in the top 10 items, a measure which is also know as hit ratio. For any given annotation, the measure will be either 1 (if the resource appears in the recommendation list)or o(if not). We average over all annotations in the test set and over all folds. 5.3. Experimental results The values chosen for k in the instance-based collaborative algorithms was selected after experimenting with values in he range 1 through 100. They are shown in Figs. 5 through 10. PITF, the pair-wise interaction tensor factorization model was built with 64 features and a learning rate of 0.03. Improvement could not be achieved by increasing the number of features or tuning the learning rate. It was trained until convergence Our experimental results reveal several key findings. First, not all social annotation systems are equivalent. The manner in which users interact with the systems produces datasets with different underlying characteristics. These characteristics in turn impact the performance of the recommenders. Secondly, an integrative approach such as the proposed linear-weighted hybrid is needed to exploit multiple dimensions of the data By leveraging these dimensions, recommenders produce results The results for basic resource recom are presented in Fig. 5 through Fig. 7. In all cases KNNur achieved the est results among the component recom This is not surprising since the resource recommendation task can be viewed as the prediction of user-resource m short, knowing which resources the user has liked is a good predictor of which resources he would like in the future However, as far as the performance of recommender components is concerned, much of the similarity between datasets ends there. Many of the algorithms demonstrate dramatic differences across datasets. For example, KNNrt performs well in Citeulike but poorly in LastFM. This difference may be explained by considering how and why these systems are used, i.e by considering the dynamics of user interaction with the system isonomy originally allowed its users to annotate journal articles. Later it was expanded to include Web sites, but the focus of the system largely remains on scientific research topics. In this system the users are motivated to organize their resources for later retrieval. They often focus on their area of expertise and use tags reflecting concepts from their discipline. The result is a social annotation system in which tags are a good model for both users' interests as well as for describing Please cite this article in press as: Gemmell et al urce recommendation in social annotation systems: A linear-weighted hybrid approach, J Comput. System Sci. (2011). doi: 10.1016/ jcss. 2011.10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.9 (1-15) J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• 9 Fig. 5. Recall and precision plotted for basic resource recommendation sets of size one through ten for Bibsonomy and Citeulike. user. This is the set Rut. Because we only have a single resource to retrieve, recall and precision are not applicable in the same way as with the previous algorithm. For this algorithm, we measure recall in the top 10 items, a measure which is also know as hit ratio. For any given annotation, the measure will be either 1 (if the resource appears in the recommendation list) or 0 (if not). We average over all annotations in the test set and over all folds. 5.3. Experimental results The values chosen for k in the instance-based collaborative algorithms was selected after experimenting with values in the range 1 through 100. They are shown in Figs. 5 through 10. PITF, the pair-wise interaction tensor factorization model, was built with 64 features and a learning rate of 0.03. Improvement could not be achieved by increasing the number of features or tuning the learning rate. It was trained until convergence. Our experimental results reveal several key findings. First, not all social annotation systems are equivalent. The manner in which users interact with the systems produces datasets with different underlying characteristics. These characteristics in turn impact the performance of the recommenders. Secondly, an integrative approach such as the proposed linear-weighted hybrid is needed to exploit multiple dimensions of the data. By leveraging these dimensions, recommenders produce results superior to what simpler recommenders produce alone. Thirdly, integrative approaches have the largest value when several dimensions of the data contain complementary information. 5.3.1. Basic resource recommendation The results for basic resource recommendation are presented in Fig. 5 through Fig. 7. In all cases KNNur achieved the best results among the component recommenders. This is not surprising since the resource recommendation task can be viewed as the prediction of user–resource pairs. In short, knowing which resources the user has liked is a good predictor of which resources he would like in the future. However, as far as the performance of recommender components is concerned, much of the similarity between datasets ends there. Many of the algorithms demonstrate dramatic differences across datasets. For example, KNNrt performs well in Citeulike but poorly in LastFM. This difference may be explained by considering how and why these systems are used, i.e., by considering the dynamics of user interaction with the system. Bibsonomy originally allowed its users to annotate journal articles. Later it was expanded to include Web sites, but the focus of the system largely remains on scientific research topics. In this system the users are motivated to organize their resources for later retrieval. They often focus on their area of expertise and use tags reflecting concepts from their discipline. The result is a social annotation system in which tags are a good model for both users’ interests as well as for describing
ARTICLE IN PRESS YJCSS: 253 MovieLens -dTSut 一 KNNur(25) --KNNru(75 2% Delicious - trKNNut(0 -e-KNNrt(10) Fig. 6. Recall and precision plotted for basic resource recommendation sets of size one through ten for MovieLens and Delicious. Amazon 台TSut LastFM Fig. 7. Recall and precision plotted for basic resource recommendation sets of size one through ten for Amazon and Last FM. Please cite this article in press as: Gemmell et aL, Resource recommendation in social annotation systems: A linear-weighted hybrid approach, J. Comput. System Sci. (2011). doi: 10.1016/j jcSs 2011. 10.006
JID:YJCSS AID:2537 /FLA [m3G; v 1.58; Prn:9/11/2011; 16:03] P.10 (1-15) 10 J. Gemmell et al. / Journal of Computer and System Sciences ••• (••••) •••–••• Fig. 6. Recall and precision plotted for basic resource recommendation sets of size one through ten for MovieLens and Delicious. Fig. 7. Recall and precision plotted for basic resource recommendation sets of size one through ten for Amazon and LastFM