Artif Intell Rev(2010)33: 187-209 DOI10.1007/s10462-0099153-2 fth 1 tagging in recommender systems: a survey state-of-the-art and possible extensions Aleksandra Klasnja Milicevic Alexandros Nanopoulos. Mirjana Ivanovic Published online: 21 January 2010 O Springer Science+Business Media B.V. 2010 Abstract Social tagging systems have grown in popularity over the Web in the last years on account of their simplicity to categorize and retrieve content using open-ended tags. The increasing number of users providing information about themselves through social tagging activities caused the emergence of tag-based profiling approaches, which assume that users expose their preferences for certain contents through tag assignments. Thus, the tagging information can be used to make recommendations. This paper presents an overview of the field of social tagging systems which can be used for extending the capabilities of recom- mender systems. Various limitations of the current generation of social tagging systems and possible extensions that can provide better recommendation capabilities are also considered Keywords Recommender systems. Social tagging. Folksonomy. Personalization 1 Introduction The information in the Web is increasing far more quickly than people can cope with Person alized recommendation(Resnick and Varian 1997)can help people te the information overload problem, by recommending items according to users'interests ol of Professional Business Studies, University of Novi Sad, Novi Sad, Serbia snja@yahoo.com Information Systems and Machine Learning Lab, University of Hildeshei Hildesheim, Germany mail: nanopoulos @ismllde Faculty of Science, Department of Mathematics and Informatics, niversity of Novi Sad, Novi Sad, Serbia e-mail: mira(dmiuns ac rs
Artif Intell Rev (2010) 33:187–209 DOI 10.1007/s10462-009-9153-2 Social tagging in recommender systems: a survey of the state-of-the-art and possible extensions Aleksandra Klasnja Milicevic · Alexandros Nanopoulos · Mirjana Ivanovic Published online: 21 January 2010 © Springer Science+Business Media B.V. 2010 Abstract Social tagging systems have grown in popularity over the Web in the last years on account of their simplicity to categorize and retrieve content using open-ended tags. The increasing number of users providing information about themselves through social tagging activities caused the emergence of tag-based profiling approaches, which assume that users expose their preferences for certain contents through tag assignments. Thus, the tagging information can be used to make recommendations. This paper presents an overview of the field of social tagging systems which can be used for extending the capabilities of recommender systems. Various limitations of the current generation of social tagging systems and possible extensions that can provide better recommendation capabilities are also considered. Keywords Recommender systems · Social tagging · Folksonomy · Personalization 1 Introduction The information in the Web is increasing far more quickly than people can cope with. Personalized recommendation (Resnick and Varian 1997) can help people to conquer the information overload problem, by recommending items according to users’ interests. A. K. Milicevic (B) Higher School of Professional Business Studies, University of Novi Sad, Novi Sad, Serbia e-mail: aklasnja@yahoo.com A. Nanopoulos Information Systems and Machine Learning Lab, University of Hildesheim, Hildesheim, Germany e-mail: nanopoulos@ismll.de M. Ivanovic Faculty of Science, Department of Mathematics and Informatics, University of Novi Sad, Novi Sad, Serbia e-mail: mira@dmi.uns.ac.rs 123
188 A. K. Milicevic et al Recommender systems use the opinions of a community of users to help individuals in hat community more effectively identify content of interest from a potentially overwhelm set of choices(Resnick et al 1994) One of the most successful technologies for recommender systems is collaborative filter ing(Konstan et al. 2004). It is built on the assumption that people who like the items they have viewed before are likely to agree again on new items. Although the assumption that collaborative filtering relied on works well in narrow domains, it is likely to fail in more diverse or mixed settings. The reason is obvious: people have similar taste in one domain may behave quite different in others. To improve recommendation quality, metadata such as content information of items has typically been used as additional knowledge. With the increasing popularity of the collabora- ve tagging systems, tags could be interesting and useful information to enhance algorithms for recommender systems Collaborative tagging systems allow users to upload their resources, and to label them with arbitrary words, so-called tags. The systems can be distinguished according to what kind of resources are supported. Flickr, 'for instance, allows the sharing of photos, Delicious2the sharing of bookmarks, Cite and Connotea# the sharing of bibliographic references, and 43Thingseven the sharing of goals in private life. These systems are all very similar Once a user is logged in, he can add a resource to the system, and assign arbitrary tags to it. The collection of all his assignments is his personomy, the collection of all personomies con- stitutes the folksonomy. The t user can explore his personomy, as well as the personomies of the other users, in all dimensions: for a given user one can see all resources he had uploaded together with the tags he had assigned to them(hotho et al. 2006a. b, c). Besides helping user to organize his or her personal collections, a tag also can be regarded as a user's personal opinion expression, while tagging can be considered as implicit rating or voting on the tagged information resources or items(liang et al. 2008). Thus, the tagging information can be used to make recommendations In this ye describe social tagging systems which can be used for extending the capabilities of recommender systems. A comprehensive survey of the state-of-the-art in col- laborative tagging systems and folksonomy is presented in Sect. 2. Section 3 presents a model for tagging activities. Tag-based recommender systems and different approaches to find best tag recommendations for items are described in Sect. 4. In Sect. 5 we identify various limi- tations of the current generation of folksonomy systems and discuss some initial approaches to extending their capabilities in Sect. 6. Finally, Sect. 7 concludes this paper 2 The survey of collaborative tagging systems and folksonomy Collaborative tagging is the practice of allowing users to freely attach keywords or tags to content( Golder and Huberman 2005). Collaborative tagging is most useful when there is nobody in the librarian"role or there is simply too much content for a single authority to classify. People tag pictures, videos, and other resources with a couple of keywords to easily retrieve them in a later stage Ihttp://www.fickr.com,nowpartofYahoo! http://del.icio.i part of Yahoo http://www.connotea.org http://www.43things.com 2 spr
188 A. K. Milicevic et al. Recommender systems use the opinions of a community of users to help individuals in that community more effectively identify content of interest from a potentially overwhelming set of choices (Resnick et al. 1994). One of the most successful technologies for recommender systems is collaborative filtering (Konstan et al. 2004). It is built on the assumption that people who like the items they have viewed before are likely to agree again on new items. Although the assumption that collaborative filtering relied on works well in narrow domains, it is likely to fail in more diverse or mixed settings. The reason is obvious: people have similar taste in one domain may behave quite different in others. To improve recommendation quality, metadata such as content information of items has typically been used as additional knowledge. With the increasing popularity of the collaborative tagging systems, tags could be interesting and useful information to enhance algorithms for recommender systems. Collaborative tagging systems allow users to upload their resources, and to label them with arbitrary words, so-called tags. The systems can be distinguished according to what kind of resources are supported. Flickr,1 for instance, allows the sharing of photos, Delicious2 the sharing of bookmarks, CiteULike3 and Connotea4 the sharing of bibliographic references, and 43Things5 even the sharing of goals in private life. These systems are all very similar. Once a user is logged in, he can add a resource to the system, and assign arbitrary tags to it. The collection of all his assignments is his personomy, the collection of all personomies constitutes the folksonomy. The user can explore his personomy, as well as the personomies of the other users, in all dimensions: for a given user one can see all resources he had uploaded, together with the tags he had assigned to them (Hotho et al. 2006a,b,c). Besides helping user to organize his or her personal collections, a tag also can be regarded as a user’s personal opinion expression, while tagging can be considered as implicit rating or voting on the tagged information resources or items (Liang et al. 2008). Thus, the tagging information can be used to make recommendations. In this paper, we describe social tagging systems which can be used for extending the capabilities of recommender systems. A comprehensive survey of the state-of-the-art in collaborative tagging systems and folksonomy is presented in Sect. 2. Section 3 presents a model for tagging activities. Tag-based recommender systems and different approaches to find best tag recommendations for items are described in Sect. 4. In Sect. 5 we identify various limitations of the current generation of folksonomy systems and discuss some initial approaches to extending their capabilities in Sect. 6. Finally, Sect. 7 concludes this paper. 2 The survey of collaborative tagging systems and folksonomy Collaborative tagging is the practice of allowing users to freely attach keywords or tags to content (Golder and Huberman 2005). Collaborative tagging is most useful when there is nobody in the “librarian” role or there is simply too much content for a single authority to classify. People tag pictures, videos, and other resources with a couple of keywords to easily retrieve them in a later stage. 1 http://www.flickr.com, now part of Yahoo! 2 http://del.icio.us, now part of Yahoo! 3 http://www.citeulike.org. 4 http://www.connotea.org. 5 http://www.43things.com. 123
Social tagging in recommender systems The following features of collaborative tagging are generally attributed to their success and popularity(Mathes 2004: Quintarelli 2005; Wu et al. 2006) Low cognitive cost and entry barriers. The simplicity of tagging allows any Web user to classify their favourite Web resources by using keywords that are not constrained by predefined vocabularies Immediate feedback and communication. Tag suggestions in collaborative tagging sys- tems provide mechanisms for users to communicate implicitly with each other through tag suggestions to describe resources on the Web Quick Adaptation to Changes in Vocabulary. The freedom provided by tagging allow like Web2.0, ontologies and social network can be used readily by me scg as. Term fast response to changes in the use of language and the emergency of new wor need to modify any pre-defined scheme Individual needs and formation of organization. Tagging systems provide a cor means for Web users to organize their favorite Web resources. Besides, as the systems develop, users are able to discover other people who are also interested in similar items. Since tags are created by individual users in a free form, one important problem facing tag ging is to identify most appropriate tags, while eliminating noise and spam. For this purpose Au Yeung et al. (2007)define a set of general criteria for a good tagging system High coverage of multiple facets. A good tag combination should include multiple facets of the tagged objects. The larger the number of facets the more likely a user is able to recall the tagged content. High popularity. If a set of tags are used by a large number of people for a particular object, these tags are more likely to uniquely identify the tagged content and the mor likely to be used by a new user for the given object. Least-effort. The number of tags for identifying an object should be minimized, and the number of objects identified by the tag combination should be small. As a result, a user reach any tagged objects in a small number of steps via tag browsing Uniformity(normalization). Since there is no universal ontology, different people can use different terms for the same concept. In general, we have observed two general types of divergence: those due to syntactic variance, e.g.color, colorize, colonise,colourise: d those due to synonym, e. g, student and pupil, which are different syntactic terms hat refer to the same underlying concept. These kinds of divergence are a double-edged sword. On the one hand, they introduce noises to the system; on the other hand it Exclusion of certain types of tags. For example, personally used organizational tags are less likely to be shared by different users. Thus, they should be excluded from public usage. Rather than ignoring these tags, tagging system includes a feature that auto-com- pletes tags as they are being typed by matching the prefixes of the tags entered by the user before. This not only improves the usability of the system but alsoenables the convergence Another important is how they operate. Marlow et al ns' design that may have immedia on the content and by the system. Some of these dimensions are listed belo
Social tagging in recommender systems 189 The following features of collaborative tagging are generally attributed to their success and popularity (Mathes 2004; Quintarelli 2005; Wu et al. 2006). • Low cognitive cost and entry barriers. The simplicity of tagging allows any Web user to classify their favourite Web resources by using keywords that are not constrained by predefined vocabularies. • Immediate feedback and communication. Tag suggestions in collaborative tagging systems provide mechanisms for users to communicate implicitly with each other through tag suggestions to describe resources on the Web. • Quick Adaptation to Changes in Vocabulary. The freedom provided by tagging allows fast response to changes in the use of language and the emergency of new words. Terms like Web2.0, ontologies and social network can be used readily by the users without the need to modify any pre-defined schemes. • Individual needs and formation of organization. Tagging systems provide a convenient means for Web users to organize their favorite Web resources. Besides, as the systems develop, users are able to discover other people who are also interested in similar items. Since tags are created by individual users in a free form, one important problem facing tagging is to identify most appropriate tags, while eliminating noise and spam. For this purpose, Au Yeung et al. (2007) define a set of general criteria for a good tagging system. • High coverage of multiple facets. A good tag combination should include multiple facets of the tagged objects. The larger the number of facets the more likely a user is able to recall the tagged content. • High popularity. If a set of tags are used by a large number of people for a particular object, these tags are more likely to uniquely identify the tagged content and the more likely to be used by a new user for the given object. • Least-effort. The number of tags for identifying an object should be minimized, and the number of objects identified by the tag combination should be small. As a result, a user can reach any tagged objects in a small number of steps via tag browsing. • Uniformity (normalization). Since there is no universal ontology, different people can use different terms for the same concept. In general, we have observed two general types of divergence: those due to syntactic variance, e.g., color, colorize, colorise, colourise; and those due to synonym, e.g., student and pupil, which are different syntactic terms that refer to the same underlying concept. These kinds of divergence are a double-edged sword. On the one hand, they introduce noises to the system; on the other hand it can increase recall. • Exclusion of certain types of tags. For example, personally used organizational tags are less likely to be shared by different users. Thus, they should be excluded from public usage. Rather than ignoring these tags, tagging system includes a feature that auto-completes tags as they are being typed by matching the prefixes of the tags entered by the user before. This not only improves the usability of the system but also enables the convergence of tags. Another important aspect of tagging systems is how they operate. Marlow et al. (2006) describe some key dimensions of tagging systems’ design that may have immediate effect on the content and usefulness of tags generated by the system. Some of these dimensions are listed below. 123
A. K. Milicevic et al 2. 1 Tagging rights The permission a user has to tag resources can effect the properties of an emergent folkson- systems can determine who may remove a tag. Also, systems can choose the resources hich users tag or specify different levels of permissions to tag. The spectrum of tagging ns ranges from a. Self-tagging--users can only tag their own contributions(e.g.Technorati),through b. Permission-based--users decide who can tag their resources(e.g Flickr), to ree-for-all--any user can tag any resource 2 Tagging support One important aspect of a tagging system is the way in which users assign tags to items They may assign arbitrary tags without prompting, they may add tags while considering those already added to a particular resource, or tags may be proposed. There are three dis tinct categones a. Blind tagging--user cannot see the other tags assigned to the resource they 're tagging b. Viewable tagging--users can see the other tags assigned to the resource they 're tagging Suggestive tagging--user sees suggested tags for the resource theyre tagging .3 gregation The aggregation of tags around a given resource is an important consideration. The system may allow for a multiplicity of tags for the same resource which may result in duplicate tags from different users. Alternatively, many systems ask the group to collectively tag an individual resource. It is able to distinguish two models of aggregation a. Bag-model--the same tag can be assigned to a resource multiple times, like in Delicious, allowing statistics to be generated and users to see if there is agreement among tagger about the content of the resource b. Set-model--a tag can be applied only once to a resource, like in Flickr 2.4 Types of object The implications for the nature of the resultant tags are numerous. The types of resource agged allow us to distinguish different tagging systems. Popular systems include simple objects, like: webpages, bibliographic materials, images, videos, songs, etc. Tags for text objects and multimedia objects can be varied. In reality, any object that can be virtually epresented can be tagged or used in a tagging system. For example, systems exist that let users tag physical locations or events(e. g, Upcoming) 2.5 Sources of material Some systems restrict the source through architecture(e. g, Flickr), while others restrict the source solely through social norms(e. g, CiteULike). Resources to be tagged can be supplied 6htp/wtechnoraticom 7http://www.upcoming.yahoo.con 2 spr
190 A. K. Milicevic et al. 2.1 Tagging rights The permission a user has to tag resources can effect the properties of an emergent folksonomy. Systems can determine who may remove a tag. Also, systems can choose the resources which users tag or specify different levels of permissions to tag. The spectrum of tagging permissions ranges from: a. Self-tagging—users can only tag their own contributions (e.g. Technorati6), through b. Permission-based—users decide who can tag their resources (e.g. Flickr), to c. Free-for-all—any user can tag any resource 2.2 Tagging support One important aspect of a tagging system is the way in which users assign tags to items. They may assign arbitrary tags without prompting, they may add tags while considering those already added to a particular resource, or tags may be proposed. There are three distinct categories: a. Blind tagging—user cannot see the other tags assigned to the resource they’re tagging b. Viewable tagging—users can see the other tags assigned to the resource they’re tagging c. Suggestive tagging—user sees suggested tags for the resource they’re tagging 2.3 Aggregation The aggregation of tags around a given resource is an important consideration. The system may allow for a multiplicity of tags for the same resource which may result in duplicate tags from different users. Alternatively, many systems ask the group to collectively tag an individual resource. It is able to distinguish two models of aggregation. a. Bag-model—the same tag can be assigned to a resource multiple times, like in Delicious, allowing statistics to be generated and users to see if there is agreement among taggers about the content of the resource b. Set-model—a tag can be applied only once to a resource, like in Flickr 2.4 Types of object The implications for the nature of the resultant tags are numerous. The types of resource tagged allow us to distinguish different tagging systems. Popular systems include simple objects, like: webpages, bibliographic materials, images, videos, songs, etc. Tags for text objects and multimedia objects can be varied. In reality, any object that can be virtually represented can be tagged or used in a tagging system. For example, systems exist that let users tag physical locations or events (e.g., Upcoming7). 2.5 Sources of material Some systems restrict the source through architecture (e.g., Flickr), while others restrict the source solely through social norms (e.g., CiteULike). Resources to be tagged can be supplied: 6 http://www.technorati.com. 7 http://www.upcoming.yahoo.com. 123
Social tagging in recommender systems a. by the participants(You Tube, Flickr, Technorati, Upcoming) b. by the system(ESP Game, Last. fm"0, Yahoo! Podcasts) c. open to any web resource(Delicious, Yahoo! My Web2.0) 2.6 Resource connectivity Resources in a tagging system, may be connected to each other independently of their tags For example, Web pages may be connected via hyperlinks, or resources can be assigned to groups(e.g. photo albums in Flickr) Connectivity can be roughly categorized as: linked, 2.7 Social connectivity Users of the system may be connected. Many tagging systems include social networking facilities that allow users to connect themselves to each other based on their areas of inter est, educational institutions, location and so forth. Like resource connectivity, the social connectivity could be defined as linked, grouped, or none The term folksonomy defines a user-generated and distributed classification system. emerging when large communities of users collectively tag resources(Wal 2005 ). Folksc s became popular on the Web with social software applications such as social book marking, photo sharing and weblogs. A number of social tagging sites such as Delicious Flickr, You Tube, CiteULike have become popular. Commonly cited advantages of folks- nomies are their flexibility, rapid adaptability, free-for-all collaborative customisation and their serendipity(Mathes 2004). People can in general use any term as a tag without exactly understanding the meaning of the terms they choose. The power of folksonomies stands in the aggregation of tagged information that one is interested in. This improves social serendipity by enabling social connections and by providing social search and navigation(Quintarelli 2005). Folksonomy shows a lot of benefits(Peters and Stock 2007) represent an authentic use of language, allow multiple interpretations are cheap methods of indexing, are the only way to index mass information on the Web, are sources for the development of ontologies, thesauri or classification systems give the quality"control"to the masses, allow searching and-perhaps even better-browsing, cognize neologisms, in help to identify communities, sources for collaborative recommender systems, nake people sensitive to information indexing There are two types of folksonomies: broad and narrow folksonomies(Wal 2005). The broad folksonomy, like Delicious, has many people tagging the same object and every person can tag the object with their own tags in their own vocabulary. Thus, in theory there is a great 9http://www.esp http://podcasts.yahoo.cor
Social tagging in recommender systems 191 a. by the participants (YouTube8, Flickr, Technorati, Upcoming) b. by the system (ESP Game9, Last.fm10, Yahoo! Podcasts11) c. open to any web resource (Delicious, Yahoo! MyWeb2.012) 2.6 Resource connectivity Resources in a tagging system, may be connected to each other independently of their tags. For example, Web pages may be connected via hyperlinks, or resources can be assigned to groups (e.g. photo albums in Flickr). Connectivity can be roughly categorized as: linked, grouped, or none. 2.7 Social connectivity Users of the system may be connected. Many tagging systems include social networking facilities that allow users to connect themselves to each other based on their areas of interest, educational institutions, location and so forth. Like resource connectivity, the social connectivity could be defined as linked, grouped, or none. The term folksonomy defines a user-generated and distributed classification system, emerging when large communities of users collectively tag resources (Wal 2005). Folksonomies became popular on the Web with social software applications such as social bookmarking, photo sharing and weblogs. A number of social tagging sites such as Delicious, Flickr, YouTube, CiteULike have become popular. Commonly cited advantages of folksonomies are their flexibility, rapid adaptability, free-for-all collaborative customisation and their serendipity (Mathes 2004). People can in general use any term as a tag without exactly understanding the meaning of the terms they choose. The power of folksonomies stands in the aggregation of tagged information that one is interested in. This improves social serendipity by enabling social connections and by providing social search and navigation (Quintarelli 2005). Folksonomy shows a lot of benefits (Peters and Stock 2007): • represent an authentic use of language, • allow multiple interpretations, • are cheap methods of indexing, • are the only way to index mass information on the Web, • are sources for the development of ontologies, thesauri or classification systems, • give the quality “control” to the masses, • allow searching and—perhaps even better—browsing, • recognize neologisms, • can help to identify communities, • are sources for collaborative recommender systems, • make people sensitive to information indexing. There are two types of folksonomies: broad and narrow folksonomies (Wal 2005). The broad folksonomy, like Delicious, has many people tagging the same object and every person can tag the object with their own tags in their own vocabulary. Thus, in theory there is a great 8 http://www.youtube.com. 9 http://www.espgame.org. 10 http://www.last.fm. 11 http://podcasts.yahoo.com. 12 http://myweb.yahoo.com. 123
A. K. Milicevic et al Fig. 1 Conceptual model of a T Users elaborative tagging system Marlow et al. 2006) t7, t2, t5 t1. t2, t3 t8 t1,t8,t7 t1, t8, t9 t1, t8, t7 number of tags that all refer to the same object (item), because users might independently se very distinct tags for the same content. The narrow folksonomy, which a tool like Flicki represents, provides benefit in tagging objects that are not easily searchable or have no other means of using text to describe or find the object. The narrow folksonomy is done by one or few people providing tags that the person uses to get back to that information. The tags, unlike in the broad folksonomy, are singular in nature. The same tag cannot be associated with a single object multiple times; in other words, the creator or publisher of an object is often the person who creates the first tags(unlike in broad folksonomies), and the option to ag may be even restricted to that person. After all, a much smaller number of tags for one and the same object can be identified in a narrow folksonomy 3 A model for tagging activities Social tagging systems allow their users to share their tags of particular resources. Each tag serves as a link to additional resources tagged in the same way by other users(Marlow et al. 2006). Certain resources may be linked to each other; at the same time, there may be relationships between users according to their own social interests, so the shared tags of a folksonomy come to interconnect the three groups of protagonists in social labeling systems Many researchers(Mika 2005: Harry et al. 2006; Ciro et al. 2007) suggested a triparti model that represents the Tagging Proces where U is the set of users who participate in a tagging activity, T is the set of available tags and I is the set of items being tagged Figure I shows a conceptual model for social tagging system where users and items are connected through the tags they assign. In this model, assign t specific item; tags are represented as typed edges connecting users and items. Items may be connected to each other(e. g, as links between web pages)and users may be associated by a social network, or sets of affiliations(e. g, users that work for the 2 spr
192 A. K. Milicevic et al. Fig. 1 Conceptual model of a collaborative tagging system (Marlow et al. 2006) t7, t2,t5 t1, t2,t3 t1, t2,t3 t1, t8,t9 t1, t8,t7 t1, t8,t7 t1, t8,t7 Items Users Tags number of tags that all refer to the same object (item), because users might independently use very distinct tags for the same content. The narrow folksonomy, which a tool like Flickr represents, provides benefit in tagging objects that are not easily searchable or have no other means of using text to describe or find the object. The narrow folksonomy is done by one or a few people providing tags that the person uses to get back to that information. The tags, unlike in the broad folksonomy, are singular in nature. The same tag cannot be associated with a single object multiple times; in other words, the creator or publisher of an object is often the person who creates the first tags (unlike in broad folksonomies), and the option to tag may be even restricted to that person. After all, a much smaller number of tags for one and the same object can be identified in a narrow folksonomy. 3 A model for tagging activities Social tagging systems allow their users to share their tags of particular resources. Each tag serves as a link to additional resources tagged in the same way by other users (Marlow et al. 2006). Certain resources may be linked to each other; at the same time, there may be relationships between users according to their own social interests, so the shared tags of a folksonomy come to interconnect the three groups of protagonists in social labeling systems: Users, Items, and Tags. Many researchers (Mika 2005; Harry et al. 2006; Ciro et al. 2007) suggested a tripartite model that represents the Tagging Process: Tagging : (U, T, I) (1) where U is the set of users who participate in a tagging activity, T is the set of available tags and I is the set of items being tagged. Figure 1 shows a conceptual model for social tagging system where users and items are connected through the tags they assign. In this model, users assign tags to a specific item; tags are represented as typed edges connecting users and items. Items may be connected to each other (e.g., as links between web pages) and users may be associated by a social network, or sets of affiliations (e.g., users that work for the same company). 123
Social tagging in recommender systems Examination( Golder and Huberman 2005)of the collaborative tagging system, such as Delicious, has revealed a rich variety in the ways in which tags are used, regularities in user tivity, tag frequencies, and bursts of popularity in bookmarking, as well as a remarkable stability in the relative proportions of tags within a given url. Tags may be used to identify the topic of a resource using nouns and proper nouns (i.e photo, album, photographer) To classify the type of resource (i.e. book, blog, article, review, event) To denote the qualities and characteristics of the item(i.e funny, useful, cool) A subset of tags, such as myfavourites, mymusic and myphotos reflect a notion of self reference Some tags are used by individuals for task organisation(e.g to read, job search, and to Time is an important factor in considering collaborative tagging systems, in fact definitions and relationships among tags could vary over time. For certain users, the number of tags can become stable over time, while for others, it keeps growing. There are three hypotheses about tags behavior over time(Harry et al. 2006) a. Tags convergence: the tags assigned to a certain Web resource tend to stabilize and to become the majority. b. Tags divergence: tag-sets that don't converge to a smaller group of more stable tags, and where the tag distribution continually changes c. Tags periodicity: after one group of users tag some local optimal tag-set, another group uses a divergent set but, after a period of time the new group,'s set becomes the new local optimal tag-set. This process may repeat and so lead to convergence after a period of instability, or it may act like a chaotic attractor. 4 Tag-based recommender systems Recommender systems in general recommend interesting or personalized information objects to users based on explicit or implicit ratings. Usually, recommender systems predict rat ings of objects or suggest a list of new objects that the user hopefully will like the most The approaches of profiling users with user-item rating matrix and keywords vectors are widely used in recommender systems. However, these approaches are used for describing two-dimensional relationships between users and items. In tag recommender systems the rec- mendations are, for a given user u E U and a given resourcer E R, a set T(u, r)C Tof tags. In many cases, T(u, r)is computed by first generating a ranking on the set of tags according to some quality or relevance criterion, from which then the top n elements are selected (Jaschke et al. 2007) Personalized recommendation is used to conquer the information overload problem, and collaborative filtering recommendation is one of the most successful recommendation tech niques to date. However, collaborative filtering recommendation becomes less effective when users have multiple interests, because users have similar taste in one aspect may behave quite different in other aspects. Information got from social tagging websites not only tells what a user likes, but also why he or she likes it. the remainder of this section, we first describe the proposed extension with integrating tags information to improve recommendation quality. We then present well-known recom- mendation algorithms for developing Tag-Based Recommender Systems. Probabilistic latent
Social tagging in recommender systems 193 Examination (Golder and Huberman 2005) of the collaborative tagging system, such as Delicious, has revealed a rich variety in the ways in which tags are used, regularities in user activity, tag frequencies, and bursts of popularity in bookmarking, as well as a remarkable stability in the relative proportions of tags within a given url. • Tags may be used to identify the topic of a resource using nouns and proper nouns (i.e. photo, album, photographer). • To classify the type of resource (i.e. book, blog, article, review, event). • To denote the qualities and characteristics of the item (i.e. funny, useful, cool). • A subset of tags, such as myfavourites, mymusic and myphotos reflect a notion of self reference. • Some tags are used by individuals for task organisation (e.g. to read, job search, and to print). Time is an important factor in considering collaborative tagging systems, in fact definitions and relationships among tags could vary over time. For certain users, the number of tags can become stable over time, while for others, it keeps growing. There are three hypotheses about tags behavior over time (Harry et al. 2006): a. Tags convergence: the tags assigned to a certain Web resource tend to stabilize and to become the majority. b. Tags divergence: tag-sets that don’t converge to a smaller group of more stable tags, and where the tag distribution continually changes. c. Tags periodicity: after one group of users tag some local optimal tag-set, another group uses a divergent set but, after a period of time the new group’s set becomes the new local optimal tag-set. This process may repeat and so lead to convergence after a period of instability, or it may act like a chaotic attractor. 4 Tag-based recommender systems Recommender systems in general recommend interesting or personalized information objects to users based on explicit or implicit ratings. Usually, recommender systems predict ratings of objects or suggest a list of new objects that the user hopefully will like the most. The approaches of profiling users with user-item rating matrix and keywords vectors are widely used in recommender systems. However, these approaches are used for describing two-dimensional relationships between users and items. In tag recommender systems the recommendations are, for a given user u ∈ U and a given resourcer ∈ R, a set Tˆ(u,r) ⊆ T of tags. In many cases, Tˆ(u,r)is computed by first generating a ranking on the set of tags according to some quality or relevance criterion, from which then the top n elements are selected (Jäschke et al. 2007). Personalized recommendation is used to conquer the information overload problem, and collaborative filtering recommendation is one of the most successful recommendation techniques to date. However, collaborative filtering recommendation becomes less effective when users have multiple interests, because users have similar taste in one aspect may behave quite different in other aspects. Information got from social tagging websites not only tells what a user likes, but also why he or she likes it. In the remainder of this section, we first describe the proposed extension with integrating tags information to improve recommendation quality. We then present well-known recommendation algorithms for developing Tag-Based Recommender Systems. Probabilistic latent 123
A. K. Milicevic et al users users Item-based CF Fig 2 Extend user-item matrix by including user tags as items and item tags as users(Tso-Sutter et al. 2008) semantic analysis(PLSA), as a novel statistical technique for the analysis of two-mode and co-occurrence data, is described in Sect. 4.2. A new kind of resource sharing system, called GroupMe!, is presented in Sect. 4.3. The FolkRank algorithm developed as a folksonomy search engine by using the graph model is reported in Sect. 4.4. Section 4.5 reviews methods for tag-based profile construction with a vector of weighted tags. In later Sect. 4.6, we com- pare the method for tag-based profile construction with a single vector of weighted tags, called the naive approach, with two different approaches, one based on co-occurrence and another based on adaptation. A clustering algorithm, named WebDCC (Web Document Conceptual Clustering)is shown in Sect. 4.7. In Sect. 4.8, we give a comprehensive survey of state-of- the-art algorithms to improve music recommendation in online music recommender system, as one prominent example of companies wich offers personalized services toward users 4.1 Extension with The current recommender systems are commonly using collaborative filtering techniques which traditionally exploit only pairs of two-dimensional data. As collaborative tagging is getting more widely used, social tags as a powerful mechanism that reveal three-dimensional correlations between users-tags-items, could also be employed as background knowledge in Recommender System. The first adaptation lies in reducing the three-dimensional folksonomy to three two-dimen sional contexts: and and This can be done by augmenting the standard user-item matrix horizontally and vertically with user and item tags correspondingly(Tso-Sutter et al. 2008). User tags, are tags that user u, uses to tag items and are viewed as items in the user-item matrix. Item tags, are tags that describe an item i, by users and play the role of users in the user-item matrix(See Fig. 2). Furthermore, instead of viewing each single tag as user or item, clustering methods can be applied to the tags suck that similar tags are grouped together. Supporting users during the tagging process is an important step towards easy-to-use applications. Consequently, different approaches have been studied in the past to find best tag recommendations for resources 2 springer
194 A. K. Milicevic et al. Fig. 2 Extend user–item matrix by including user tags as items and item tags as users (Tso-Sutter et al. 2008) semantic analysis (PLSA), as a novel statistical technique for the analysis of two-mode and co-occurrence data, is described in Sect. 4.2. A new kind of resource sharing system, called GroupMe!, is presented in Sect. 4.3. The FolkRank algorithm developed as a folksonomy search engine by using the graph model is reported in Sect. 4.4. Section 4.5 reviews methods for tag-based profile construction with a vector of weighted tags. In later Sect. 4.6, we compare the method for tag-based profile construction with a single vector of weighted tags, called the naive approach, with two different approaches, one based on co-occurrence and another based on adaptation. A clustering algorithm, named WebDCC (Web Document Conceptual Clustering) is shown in Sect. 4.7. In Sect. 4.8, we give a comprehensive survey of state-ofthe-art algorithms to improve music recommendation in online music recommender system, as one prominent example of companies wich offers personalized services toward users. 4.1 Extension with tags The current recommender systems are commonly using collaborative filtering techniques, which traditionally exploit only pairs of two-dimensional data. As collaborative tagging is getting more widely used, social tags as a powerful mechanism that reveal three-dimensional correlations between users–tags–items, could also be employed as background knowledge in Recommender System. The first adaptation lies in reducing the three-dimensional folksonomy to three two-dimensional contexts: and and . This can be done by augmenting the standard user-item matrix horizontally and vertically with user and item tags correspondingly (Tso-Sutter et al. 2008). User tags, are tags that user u, uses to tag items and are viewed as items in the user-item matrix. Item tags, are tags that describe an item i, by users and play the role of users in the user-item matrix (See Fig. 2). Furthermore, instead of viewing each single tag as user or item, clustering methods can be applied to the tags such that similar tags are grouped together. Supporting users during the tagging process is an important step towards easy-to-use applications. Consequently, different approaches have been studied in the past to find best tag recommendations for resources. 123
Social tagging in recommender systems 4.2 PLSA Probabilistic latent semantic analysis(PLSA)is a novel statistical technique for the anal ysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas PLSA has been shown to improve the quality of collaborative filtering based recommenders (Hofmann 1999) by assuming an underlying lower dimensional latent topic model. Com pared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. Cohn and Hofmann(2000)consider the problem of document clustering and extend the PLSA algorithm to combine content-based and hyperlink-based similarities into a unified model. Wetzker et al. (2009)extended this aproach in such that the topic model is estimated from the item-user as well as the item-tag observations in parallel. The inclusion of tags reduces known collaborative filtering problems related to overfitting and allows for higher quality recommendations. Experimental results on a large snapshot of the Delicious book marking service showed the scalability of their approach and an improved recommendation quality compared to two-mode collaborative or annotation based methods. Model fusion sing PLSA was also successfully applied to the discovery of navigational patterns on the Web (Jin et al. 2004), in music recommendation combining multiple similarity measures (Arenas-Garcia et al. 2007)and for the cross-domain knowledge transfer( Gui-Rong et al 2008) According to Hotho et al.(2006a, b, c), a folksonomy can be described as a tripartite graph whose vertex set is partitioned into three disjoint sets of users U=ul,., ur, tags T={1,……,hn} and items I={i1,……,im} his model can be simplified to two bipartite models where the collaborative filtering model lu is built from the item user co-occurrence counts f(i, m) and the annotation-based model It derives from the co-occurrence counts between items and tags f(i, t). In the case of social book marking IU becomes a binary matrix(f(, I)E 10, 1D), as users can bookmark a given web resource only once The aspect model of PLSA associates the co-occurrence of observations with a hidden topic variable (Z zk. In the context of collaborative filtering an observation cor- responds to the bookmarking of an item by a user and all observations are given by the co-occurrence matrix IU (Wetzker et al. 2009). Users and items are assumed independen given the topic variable Z. The probability that an item was bookmarked by a given user c be computed by summing over all latent variables Z: P(immun) P(imIzk)P(ZiJun) Analog to(2), the conditional probability between tags and items can be written as: P(mhn)=∑P(im|)P(xn Following the Cohn's and Hofmanns procedure (2000), we can now combine both model based on the common factor P(im zk) by maximizing the log-likelihood function L=∑|∑f(m,u) ) log P(immun)+(1-a)∑f(omh)bgPm)(4)
Social tagging in recommender systems 195 4.2 PLSA Probabilistic latent semantic analysis (PLSA) is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. PLSA has been shown to improve the quality of collaborative filtering based recommenders (Hofmann 1999) by assuming an underlying lower dimensional latent topic model. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. Cohn and Hofmann (2000) consider the problem of document clustering and extend the PLSA algorithm to combine content-based and hyperlink-based similarities into a unified model. Wetzker et al. (2009) extended this aproach in such that the topic model is estimated from the item-user as well as the item-tag observations in parallel. The inclusion of tags reduces known collaborative filtering problems related to overfitting and allows for higher quality recommendations. Experimental results on a large snapshot of the Delicious bookmarking service showed the scalability of their approach and an improved recommendation quality compared to two-mode collaborative or annotation based methods. Model fusion using PLSA was also successfully applied to the discovery of navigational patterns on the Web (Jin et al. 2004), in music recommendation combining multiple similarity measures (Arenas-García et al. 2007) and for the cross-domain knowledge transfer (Gui-Rong et al. 2008). According to Hotho et al. (2006a,b,c), a folksonomy can be described as a tripartite graph whose vertex set is partitioned into three disjoint sets of users U = {u1,..., ul}, tags T = {t1,..., tn} and items I = {i1,...,im} his model can be simplified to two bipartite models where the collaborative filtering model IU is built from the item user co-occurrence counts f (i, u) and the annotation-based model IT derives from the co-occurrence counts between items and tags f (i, t). In the case of social book marking IU becomes a binary matrix ( f (i, u) ∈ {0, 1}), as users can bookmark a given web resource only once. The aspect model of PLSA associates the co-occurrence of observations with a hidden topic variable {Z = z1,...,zk }. In the context of collaborative filtering an observation corresponds to the bookmarking of an item by a user and all observations are given by the co-occurrence matrix IU (Wetzker et al. 2009). Users and items are assumed independent given the topic variable Z. The probability that an item was bookmarked by a given user can be computed by summing over all latent variables Z: P(im|ul) = k P(im|zk )P(zk |ul), (2) Analog to (2), the conditional probability between tags and items can be written as: P(im|tn) = k P(im|zk )P(zk |tn), (3) Following the Cohn’s and Hofmann’s procedure (2000), we can now combine both models based on the common factor P(im|zk ) by maximizing the log-likelihood function: L = m α l f (im, ul)log P(im|ul) + (1 − α) n f (im, tn)log P(im|tn) , (4) 123
A. K. Milicevic et al where is a predefined weight for the influence of each twomode model. Using the Expecta tion-Maximization(EM)algorithm( Cohn and Hofmann 2000)it can be performed maximum likelihood parameter estimation for the aspect model. The standard procedure for maximum likelihood imation in latent variable models is the em algorithm(Arenas- Garcia et al. 2007). EM alternates two coupled steps: (i)an expectation(E)step where posterior probabil- ities are computed for the latent variables, (ii)an maximization(M) step, where paramet are updated. Standard calculations yield the E-step equation P(imzk)P(zun) P(zui, im) P(mun P(zkItn, im) P(imIzk)P(akin) and then re-estimate parameters in the M-step as follows P(zklu))x >f(ul, im)P(zklul, im) (6) P(kln)a 2/(n,im)P(kIn,im) p(imlzk)aa>f(ul, im )P(k lu), im) +(1-a)>f(n, im)(akin,im) Based on the iterative computation of the above E and M steps, the EM algorithm mono- tonically increases the likelihood of the combined model on the observed data. USing the parameter, this model can be easily reduced to a collaborative filtering or annotation-based model by setting to 1.0 or 0.0, respectively It is possible to recommend items to a user ur weighted by the probability P(imun) from Eq(1). For items already bookmarked by the user in the training data this weight set to 0, thus they are appended to the end of the recommended item list. A hybrid approach to the task of item recommendation in folksonomies that includes user generated annotations produces better results than a standard collaborative filtering or 4.3 The GroupMe! system GroupMe! is a new kind of resource sharing system(Abel et al. 2007). It extends the idea of social bookmarking systems, with the ability to create groups of multimedia Web resources GroupMe! has an easy-to-use interface which enables the creation of groups via drag drop operations. An important feature of the GroupMe! system is its visualization of groups(Abel et al 2007). Resources are visualized according to their media type, e. g. pictures are displayed as thumbnails, videos and audio recordings can be played directly within the group, and RSS(Rich Site Summary) feeds are previewed by displaying recent headlines. GroupMe! groups are interpreted as regular Web resources and can also be arranged within groups This enables users to build hierarchies of Web resources. GroupMe! groups are dynamic collections, which may change over time. Other users who are interested in the content of a group can subscribe to the group and will be notified whenever the group is modified, e.g.a ew resource is added or removed, new tags have been assigned, etc. Users can also utilize 2 springer
196 A. K. Milicevic et al. where is a predefined weight for the influence of each twomode model. Using the Expectation-Maximization (EM) algorithm (Cohn and Hofmann 2000) it can be performed maximum likelihood parameter estimation for the aspect model. The standard procedure for maximum likelihood estimation in latent variable models is the EM algorithm (Arenas-García et al. 2007). EM alternates two coupled steps: (i) an expectation (E) step where posterior probabilities are computed for the latent variables, (ii) an maximization (M) step, where parameters are updated. Standard calculations yield the E-step equation: P(zk |ul,im) = P(im|zk )P(zk |ul) P(im|ul) P(zk |tn,im) = P(im|zk )P(zk |tn) P(im|tn) (5) and then re-estimate parameters in the M-step as follows: P(zk |ul) ∝ m f (ul,im)P(zk |ul,im) (6) P(zk |tn) ∝ m f (tn,im)P(zk |tn,im) (7) p(im|zk ) ∝ α l f (ul,im)P(zk |ul,im) +(1 − α) n f (tn,im)P(zk |tn,im) (8) Based on the iterative computation of the above E and M steps, the EM algorithm monotonically increases the likelihood of the combined model on the observed data. Using the parameter, this model can be easily reduced to a collaborative filtering or annotation-based model by setting to 1.0 or 0.0, respectively. It is possible to recommend items to a user ul weighted by the probability P(im|ul) from Eq. (1). For items already bookmarked by the user in the training data this weight set to 0, thus they are appended to the end of the recommended item list. A hybrid approach to the task of item recommendation in folksonomies that includes user generated annotations produces better results than a standard collaborative filtering or annotation-based method. 4.3 The GroupMe! system GroupMe! is a new kind of resource sharing system (Abel et al. 2007). It extends the idea of social bookmarking systems, with the ability to create groups of multimedia Web resources. GroupMe! has an easy-to-use interface which enables the creation of groups via drag & drop operations. An important feature of the GroupMe! system is its visualization of groups (Abel et al. 2007). Resources are visualized according to their media type, e.g. pictures are displayed as thumbnails, videos and audio recordings can be played directly within the group, and RSS (Rich Site Summary) feeds are previewed by displaying recent headlines. GroupMe! groups are interpreted as regular Web resources and can also be arranged within groups. This enables users to build hierarchies of Web resources. GroupMe! groups are dynamic collections, which may change over time. Other users who are interested in the content of a group can subscribe to the group and will be notified whenever the group is modified, e.g. a new resource is added or removed, new tags have been assigned, etc. Users can also utilize 123