Tag recommendation for Folksonomies Oriented towards Individual users Marek Lipczak Faculty of Computer Science, Dalhousie University, Halifax, Canada, B3H 1W5 Abstract. Tagging has become a standard way of organizing informa- tion on the Web, particularly in folksonomies- data repositories freely created by communities of users. A few tags attached to each resource create a bridge between heterogeneous data and users accustomed to keyword-based search and browsing. To establish this connection, tag- ging requires users to manually define tags for each resource they enter to the system. This potentially time-consuming step can be eased by tag ecommender systems, which propose terms that users may choose to use as tags. This paper suggests and evaluates potential sources of rec- ommended tags, focusing on folksonomies oriented towards individual suggestions are used to propose a three-step t dation system. Basic tags are extracted from the resource title. In the ext step, the set of potential recommendations is extended by related tags proposed by a lexicon based on co-occurrences of tags within re- rce's posts. Finally, tags are filtered by the users personomy -a set of tags previously used by the user. 1 Introduction Folksonomy services allow users to store and share various types of Internet resources. The content of folksonomies is completely defined by communities of their users. Large number of creators and resources push the folksonomies from the traditional hierarchical data structure design based on directories cre- ated by system editors(e. g, Open Directory Project )to tag-based taxonomies defined jointly by service users(e. g, BibSonomy 2, del. icio. us Flickr, Techno- ratio).While adding a resource to the system, users are asked to define a set of tags -keywords which describe it and relate it to other resources gathered in the system. To ease this process, some folksonomy services recommend a set of tially matching tags. Proposing a tag recommendation system was a task of ECML PKDD discovery challenge 20086. This paper presents a tag recom- mendation system submitted to the challenge http://www.dmoz.org/about.html http://bibsonomy.org/help/about, 3http://del.icio.us/about/ Shttp://technorati.com/about, http://www.kde.cs.uni-kassel.de/ws/rsdc08/
Tag Recommendation for Folksonomies Oriented towards Individual Users Marek Lipczak Faculty of Computer Science, Dalhousie University, Halifax, Canada, B3H 1W5 lipczak@cs.dal.ca Abstract. Tagging has become a standard way of organizing information on the Web, particularly in folksonomies – data repositories freely created by communities of users. A few tags attached to each resource create a bridge between heterogeneous data and users accustomed to keyword-based search and browsing. To establish this connection, tagging requires users to manually define tags for each resource they enter to the system. This potentially time-consuming step can be eased by tag recommender systems, which propose terms that users may choose to use as tags. This paper suggests and evaluates potential sources of recommended tags, focusing on folksonomies oriented towards individual users. These suggestions are used to propose a three-step tag recommendation system. Basic tags are extracted from the resource title. In the next step, the set of potential recommendations is extended by related tags proposed by a lexicon based on co-occurrences of tags within resource’s posts. Finally, tags are filtered by the user’s personomy – a set of tags previously used by the user. 1 Introduction Folksonomy services allow users to store and share various types of Internet resources. The content of folksonomies is completely defined by communities of their users. Large number of creators and resources push the folksonomies from the traditional hierarchical data structure design based on directories created by system editors (e.g., Open Directory Project1 ) to tag-based taxonomies defined jointly by service users (e.g., BibSonomy2 , del.icio.us3 , Flickr4 , Technorati5 ).While adding a resource to the system, users are asked to define a set of tags – keywords which describe it and relate it to other resources gathered in the system. To ease this process, some folksonomy services recommend a set of potentially matching tags. Proposing a tag recommendation system was a task of ECML PKDD discovery challenge 20086 . This paper presents a tag recommendation system submitted to the challenge. 1 http://www.dmoz.org/about.html 2 http://bibsonomy.org/help/about/ 3 http://del.icio.us/about/ 4 http://flickr.com/about/ 5 http://technorati.com/about/ 6 http://www.kde.cs.uni-kassel.de/ws/rsdc08/
The formal definition of folksonomy can be found in [ 6. A folksonomy is a collection of resources entered by users in posts. Each post consists of a resource nd a set of tags attached to it by a user. generally, the resource is specific to the user who added it to the system. However, for some types of resources(e. g bookmarks)identical resources can be added to the system by different users In the latter case, by the set of resource tags we denote all tags attached to a given resource by various users Folksonomies can be classified into two types based on the objective of the tagging process. The first type, represented by bibSonomy and delicio us, treats resources(e.g, personal bookmarks)as an individual property of a user. Here the aim of tags is to create a repository tailored to individual user interests. In this paper, this type is referred to as folksonomies oriented towards individual users. The second type of folksonomies, represented by Flickr and Technorati, is a shared repository of public resources(e. g, blog entries). In this case tags are added keeping in mind a broad audience that in the future would like to search for the resource. In this paper, this type is referred to as folksonomies oriented towards broad audience. As the reason of tagging a resource is fundamentally different, we may expect that a tag recommendation system that suits one folk- sonomy type would be inappropriate for the other. This paper focuses on the first type, proposing a tag recommender for individual users 2 Related work The attention of researchers is mostly directed to tag recommendation systems for broad audience folksonomies. Tag Assist [12 is a system designed to recom- mend tags of blog posts. The recommendation is built on tags previously at- tached to similar resources. Earlier, meaning disambiguation is performed based on co-occurrence of tags in the complete repository. Co-occurrence of tags was also used by Sigurbjornsson and van Zwol [11 to propose tags that complement user-defined tags of photographs in Flickr The problem of tag recommendation in folksonomies oriented towards indi- vidual users was addressed by Jaschke et al. [ 7. They compared a number of recommendation techniques including collaborative filtering, PageRank, and it modification suited for folksonomies-FolkRank. The evaluation showed that the FolkRank based recommender outperforms other approaches; however, the tests were performed on a dense core of folksonomy, thus might be not representative Most of the tag recommendation systems are based on the tags that are al- ready present in the system. An exception from this rule is the system presented by Lee and Chun 9. The system recommends tags retrieved from the content of a blog, using artificial neural network. The network is trained based on sta tistical information about word frequencies and lexical information about word semantics extracted from WordNet Schmitz et al. 10 proposed association rule mining as a technique that might be useful in the tag recommendation process. The intuition behind this concept was also used in the system presented by this paper
The formal definition of folksonomy can be found in [6]. A folksonomy is a collection of resources entered by users in posts. Each post consists of a resource and a set of tags attached to it by a user. Generally, the resource is specific to the user who added it to the system. However, for some types of resources (e.g., bookmarks) identical resources can be added to the system by different users. In the latter case, by the set of resource tags we denote all tags attached to a given resource by various users. Folksonomies can be classified into two types based on the objective of the tagging process. The first type, represented by BibSonomy and del.icio.us, treats resources (e.g., personal bookmarks) as an individual property of a user. Here, the aim of tags is to create a repository tailored to individual user interests. In this paper, this type is referred to as folksonomies oriented towards individual users. The second type of folksonomies, represented by Flickr and Technorati, is a shared repository of public resources (e.g., blog entries). In this case tags are added keeping in mind a broad audience that in the future would like to search for the resource. In this paper, this type is referred to as folksonomies oriented towards broad audience. As the reason of tagging a resource is fundamentally different, we may expect that a tag recommendation system that suits one folksonomy type would be inappropriate for the other. This paper focuses on the first type, proposing a tag recommender for individual users. 2 Related work The attention of researchers is mostly directed to tag recommendation systems for broad audience folksonomies. TagAssist [12] is a system designed to recommend tags of blog posts. The recommendation is built on tags previously attached to similar resources. Earlier, meaning disambiguation is performed based on co-occurrence of tags in the complete repository. Co-occurrence of tags was also used by Sigurbj¨ornsson and van Zwol [11] to propose tags that complement user-defined tags of photographs in Flickr. The problem of tag recommendation in folksonomies oriented towards individual users was addressed by J¨aschke et al. [7]. They compared a number of recommendation techniques including collaborative filtering, PageRank, and its modification suited for folksonomies – FolkRank. The evaluation showed that the FolkRank based recommender outperforms other approaches; however, the tests were performed on a dense core of folksonomy, thus might be not representative. Most of the tag recommendation systems are based on the tags that are already present in the system. An exception from this rule is the system presented by Lee and Chun [9]. The system recommends tags retrieved from the content of a blog, using artificial neural network. The network is trained based on statistical information about word frequencies and lexical information about word semantics extracted from WordNet. Schmitz et al. [10] proposed association rule mining as a technique that might be useful in the tag recommendation process. The intuition behind this concept was also used in the system presented by this paper
3 Examined dataset All presented experiments and the evaluation of proposed tag recommenda- tion system were performed on a snapshot of Bib Sonomy 5 containing 2, 570 users, 242, 175 resources and 274, 139 posts(after preprocessing). The snapshot was provided by the organizers of the eCMl PKdd discovery challenge 2008. The preprocessing phase included removing useless tags(e. g, "system: unfiled") changing all letters to lower case and removing non-alphabetical and non-numeri- cal characters from tags The statistical characteristics of folksonomies have been an object of many research publications 2, 3, 8, 11]. In the following sections I present experiments particularly important from the perspective of the tag recommendation task 3.1 General characteristics The frequency distribution of tags from the Bibsonomy snapshot shows that mid and low-frequency tags follow Zipfs distribution(Fig. 1). Zipf's distribution does not hold for high-frequency tags. The frequency distribution of tags from Flickr, which represents folksonomies oriented towards broad audience shows important differences [11]. Flickr's low-frequency tags does not follow Zipf's distribution a possible explanation of this fact is a smaller number of user specific tags in comparison to folksonomies oriented towards individual users. In addition Flickr's high-frequency tags follows Zipfs distribution and are too general to be used as recommendation. The list of the most frequent tags from Bibsonomy (“ software”,“web20”,“ tools”,“web”,"blog") shows that tag recommenders for folksonomies oriented towards individual users should not ignore high-frequency The difference between two folksonomy types may have impact on the effi ciency of applied tag recommendation methods. A commonly used collaborative filtering approach is based on the intuition that the best recommendation con- sists of tags attached to the resource by people similar to the user. This approach proved its quality in many recommendation systems; however, the intuition be- hind it can be deceiving. Folksonomies like BibSonomy or del icio us are mainly designed as a collection of repositories of individual users. B user defines his/ her own set of used tags- personomy [ 6, which describes the resources from a user's point of view. As a result, users addressing similar re- ources do not have to use similar tags, and similar personomies do not have to be associated with similarity in tagged resources. In fact, there is no such correlation in the processed BibSonomy snapshot. The cosine similarity between users calculated based on tags seems to be uncorrelated with that calculated based on resources(Fig. 2). In this situation recommending tags assigned to a resource by similar use laborative filtering) should give similar results as recommending the tags frequently attached to the resource by any user. This conclusion seems to be confirmed by the experiment presented by Jaschke et al. 7. Minding the limitations of the collaborative approach I decided to focus on a tag space that is directly related to a pos
3 Examined dataset All presented experiments and the evaluation of proposed tag recommendation system were performed on a snapshot of BibSonomy [5] containing 2, 570 users, 242, 175 resources and 274, 139 posts (after preprocessing). The snapshot was provided by the organizers of the ECML PKDD discovery challenge 2008. The preprocessing phase included removing useless tags (e.g., “system:unfiled”), changing all letters to lower case and removing non-alphabetical and non-numerical characters from tags. The statistical characteristics of folksonomies have been an object of many research publications [2, 3, 8, 11]. In the following sections I present experiments particularly important from the perspective of the tag recommendation task. 3.1 General characteristics The frequency distribution of tags from the Bibsonomy snapshot shows that midand low-frequency tags follow Zipf’s distribution (Fig. 1). Zipf’s distribution does not hold for high-frequency tags. The frequency distribution of tags from Flickr, which represents folksonomies oriented towards broad audience shows important differences [11]. Flickr’s low-frequency tags does not follow Zipf’s distribution. A possible explanation of this fact is a smaller number of user specific tags in comparison to folksonomies oriented towards individual users. In addition, Flickr’s high-frequency tags follows Zipf’s distribution and are too general to be used as recommendation. The list of the most frequent tags from Bibsonomy (“software”, “web20”, “tools”, “web”, “blog”) shows that tag recommenders for folksonomies oriented towards individual users should not ignore high-frequency terms. The difference between two folksonomy types may have impact on the effi- ciency of applied tag recommendation methods. A commonly used collaborative filtering approach is based on the intuition that the best recommendation consists of tags attached to the resource by people similar to the user. This approach proved its quality in many recommendation systems; however, the intuition behind it can be deceiving. Folksonomies like BibSonomy or del.icio.us are mainly designed as a collection of repositories of individual users. By adding posts, each user defines his/her own set of used tags – personomy [6], which describes the resources from a user’s point of view. As a result, users addressing similar resources do not have to use similar tags, and similar personomies do not have to be associated with similarity in tagged resources. In fact, there is no such correlation in the processed BibSonomy snapshot. The cosine similarity between users calculated based on tags seems to be uncorrelated with that calculated based on resources (Fig. 2). In this situation recommending tags assigned to a resource by similar users (collaborative filtering) should give similar results as recommending the tags frequently attached to the resource by any user. This conclusion seems to be confirmed by the experiment presented by J¨aschke et al. [7]. Minding the limitations of the collaborative approach I decided to focus on a tag space that is directly related to a post
Fig. 2. Cosine similarity between each pair Fig 1. The overall frequency distribu- of users calculated based on tags(tf-idf tion of tags(after preprocessing and re- weights)and resources(binary weights) moving posts classified as imported). The two values seem to be independent 3.2 Characteristics based on individual posts Considering only the direct surrounding of the post, the potential tag recom mendations can be obtained from the resource itself. the set of tags attached to the resource in previous posts, or the set of tags that were already used by the user(users personor Exploiting tags from the resource depends on the folksonomy character. In BibSonomy the resource can be a bibtex entry or a web- page bookmark. The first contains bibliographic information about a research publication including its title and abstract. The second contains web-page title and URL. Preliminary experiments showed that using title words as tags outper forms the results of abstracts and URLs. The latter two contain lesser amount of correct tags. The title is the only element that joins both resource types and it is common in other folksonomies, which are its additional advantages. I decided to use the title as the representation of resource. To evaluate the three potential sources of tag recommendations, namely words from the resource title, resource tags and user's personomy, I checked for each post if its tags can be found in any of these sources associated with other posts in the folksonomy. The quality of sources was measured by precision (i. e, number of correct tags retrieved divided by the total number of retrieved tags)and recall (i.e, number of correct tags retrieved divided by the total ber of correct tags). These are standard information retrieval metrics [4.The value of recall was averaged over all tested posts. The averaged recall informs us how many correct tags can be found in a source. The value of precision was averaged only over posts, for which the source returned any tags. Precision av eraged this way is the ratio of correct tags among all tags retrieved. In addition, I present the total number of potential tags obtained from the sources, and the number of correct tags among them(Fig 3) User's personomy is the richest source of correct tag recommendations. For the tested BibSonomy snapshot it gave access to 90% of tags from test posts. On
1 10 100 1000 10000 100000 1 10 100 1000 10000 100000 frequency rank Fig. 1. The overall frequency distribution of tags (after preprocessing and removing posts classified as imported). Fig. 2. Cosine similarity between each pair of users calculated based on tags (tf-idf weights) and resources (binary weights). The two values seem to be independent. 3.2 Characteristics based on individual posts Considering only the direct surrounding of the post, the potential tag recommendations can be obtained from the resource itself, the set of tags attached to the resource in previous posts, or the set of tags that were already used by the user (user’s personomy). Exploiting tags from the resource depends on the folksonomy character. In BibSonomy the resource can be a bibtex entry or a webpage bookmark. The first contains bibliographic information about a research publication including its title and abstract. The second contains web-page title and URL. Preliminary experiments showed that using title words as tags outperforms the results of abstracts and URLs. The latter two contain lesser amount of correct tags. The title is the only element that joins both resource types and it is common in other folksonomies, which are its additional advantages. I decided to use the title as the representation of resource. To evaluate the three potential sources of tag recommendations, namely words from the resource title, resource tags and user’s personomy, I checked for each post if its tags can be found in any of these sources associated with all other posts in the folksonomy. The quality of sources was measured by precision (i.e., number of correct tags retrieved divided by the total number of retrieved tags) and recall (i.e., number of correct tags retrieved divided by the total number of correct tags). These are standard information retrieval metrics [4]. The value of recall was averaged over all tested posts. The averaged recall informs us how many correct tags can be found in a source. The value of precision was averaged only over posts, for which the source returned any tags. Precision averaged this way is the ratio of correct tags among all tags retrieved. In addition, I present the total number of potential tags obtained from the sources, and the number of correct tags among them (Fig. 3). User’s personomy is the richest source of correct tag recommendations. For the tested BibSonomy snapshot it gave access to 90% of tags from test posts. On
段‰ 8 2 65.047.072 Tags not found: 62, 324(7 Fig 3. Venn diagrams presenting average recall, plus the number of correct tags found in three potential sources of tags(left)and average precision, plus the total number of tags retrieved from these sources(right) the other hand, correct tags from personomy are accompanied by a large num- ber of incorrect tags(precision around 0.001). Compared to tags retrieved from personomy,the recommendation based on resource title is much more precise however, the number of correct tags found this way is lower. In addition, most of these tags can be also found in the users personomy. Finally, both recall and precision values show that resource tags are not a good source of potential tag recommendations. The character of each tag recommendation source and their otential usability in tag recommendation system are discussed in the following Resource title Resource title appears to be the most robust source of tag recommendations. Among all posts in processed BibSonomy snapshot only 51 resource titles were unable to produce any tags(no letters or numbers in the title). In addition, among all discussed sources the title seems to be the most strongly related to the resource. The drawback of this source is low recall which makes the title inappropriate as a stand-alone tag recommender. The title is a simplified natural language sentence, which should be cleaned of words with no informative value(e. g, stopwords) Resource tags Tags assigned to the resource by other folksonomy users are not a good source of tag recommendations. One of the reasons is the sparsity of data; 92% of resources were added to the system only once. This fact significantly limits the possible recall of this source of tags. The other issue is the personal haracter of posts(discussed in section 3. 1), which hurts the precision of retrieved tags. The variety of tags attached by users creates, however, another application f resource tag sets. Mining relations between tags attached to the same resource can result in a simplified semantic lexicon. The lexicon would not give us the
Title Tags Resource Tags (0.03) 26,054 (0.01) 10,162 (0.004) 4,476 (0.66) 529,357 (0.15) 113,707 (0.07) 74,926 (0.02) 22,746 (avg. recall) nr of correct tags Total nr of tested tags: 843,752 Tags not found: 62,324 (7%) Personomy Tags Title Tags Resource Tags (0.04) 713,025 (0.04) 218,420 (0.22) 19,637 (0.001) 565,047,072 (0.26) 444,295 (0.1) 762,316 (0.13) 177,977 (avg. precision) nr of proposed tags Personomy Tags Fig. 3. Venn diagrams presenting average recall, plus the number of correct tags found in three potential sources of tags (left) and average precision, plus the total number of tags retrieved from these sources (right). the other hand, correct tags from personomy are accompanied by a large number of incorrect tags (precision around 0.001). Compared to tags retrieved from personomy, the recommendation based on resource title is much more precise; however, the number of correct tags found this way is lower. In addition, most of these tags can be also found in the user’s personomy. Finally, both recall and precision values show that resource tags are not a good source of potential tag recommendations. The character of each tag recommendation source and their potential usability in tag recommendation system are discussed in the following sections. Resource title Resource title appears to be the most robust source of tag recommendations. Among all posts in processed BibSonomy snapshot only 51 resource titles were unable to produce any tags (no letters or numbers in the title). In addition, among all discussed sources the title seems to be the most strongly related to the resource. The drawback of this source is low recall which makes the title inappropriate as a stand-alone tag recommender. The title is a simplified natural language sentence, which should be cleaned of words with no informative value (e.g., stopwords). Resource tags Tags assigned to the resource by other folksonomy users are not a good source of tag recommendations. One of the reasons is the sparsity of data; 92% of resources were added to the system only once. This fact significantly limits the possible recall of this source of tags. The other issue is the personal character of posts (discussed in section 3.1), which hurts the precision of retrieved tags. The variety of tags attached by users creates, however, another application of resource tag sets. Mining relations between tags attached to the same resource can result in a simplified semantic lexicon. The lexicon would not give us the
information about the character of relation, but given a tag, the lexicon can point out related tags which are also potential recomendations. The lexicon consists of general relations between tags and can be used independently of the resources This fact reduces the negative impact of data sparsity. In addition, it is suited for a particular folksonomy and it can capture specific relations between its tags Personomy tags Building his/her personomy the user is interested in repre- senting his/her interests using a limited number of tags. The same tag will be attached to resources fitting a particular interest, for example, all articles re- lated to user's master thesis will be tagged by the same keyword. In addition users are likely to stick with one lexical form of a word or expression, for ex- ample using constantly singular or plural form of a noun(e. g, "publication"or "publications"). These are the reasons why we are likely to find a lot of good recommendations among users tags. The problem is that the choice of the lex ical form or the word that describes the interest is completely up to the user For the given example of resources related to master thesis the tag may be masterthesis”,“msc”,“ thesis”,“uork”, or any other that according to user's opinion conveys the information To describe the resource more accurately users pick additional tags, specific not only to the also to the resources. This is likely the cause of a large number of low-frequency tags(see section 3. 1)and complicates the process of retrieving potential recommendations from personomy 4 Tag recommendation system The tag recommendation system(Algorithm 1), described in this section, is based on observations from the presented statistical experiments. The system is built of three steps. The first step produces tags from resource title words and assigns a score that represents their usefulness for previously tagged resources The second step uses the resource tag based lexicon to propose tags related to tags taken from the title. The third step checks the tags proposed by the lexicon against user's personomy. The tags recommended to the user are a union of most promising tags produced in step one and three. The following sections give the detailed description of each step Extraction of title based tags The resource title is divided into words, which are then cleaned of non-alphabetical and non-numerical characters. The system assigns a score to each word, which represents the probability of being chosen as a tag- number of times being chosen as a tag divided by the number of occurrences. If the word occurred in the titles of previously entered resources less than 100 times its probability of being a correct tag is set to 0. 1 which is an empirically estimated value for low-frequency tags. The probability score is introduced to reduce the impact of stopwords. It is important to notice that the standard stopwords list, which is often used in information retrieval systems, is
information about the character of relation, but given a tag, the lexicon can point out related tags which are also potential recomendations. The lexicon consists of general relations between tags and can be used independently of the resources. This fact reduces the negative impact of data sparsity. In addition, it is suited for a particular folksonomy and it can capture specific relations between its tags. Personomy tags Building his/her personomy the user is interested in representing his/her interests using a limited number of tags. The same tag will be attached to resources fitting a particular interest, for example, all articles related to user’s master thesis will be tagged by the same keyword. In addition, users are likely to stick with one lexical form of a word or expression, for example using constantly singular or plural form of a noun (e.g., “publication” or “publications”). These are the reasons why we are likely to find a lot of good recommendations among user’s tags. The problem is that the choice of the lexical form or the word that describes the interest is completely up to the user. For the given example of resources related to master thesis the tag may be “masterthesis”, “msc”, “thesis”, “work”, or any other that according to user’s opinion conveys the information. To describe the resource more accurately users pick additional tags, specific not only to the user, but also to the resources. This is likely the cause of a large number of low-frequency tags (see section 3.1) and complicates the process of retrieving potential recommendations from personomy. 4 Tag recommendation system The tag recommendation system (Algorithm 1), described in this section, is based on observations from the presented statistical experiments. The system is built of three steps. The first step produces tags from resource title words and assigns a score that represents their usefulness for previously tagged resources. The second step uses the resource tag based lexicon to propose tags related to tags taken from the title. The third step checks the tags proposed by the lexicon against user’s personomy. The tags recommended to the user are a union of most promising tags produced in step one and three. The following sections give the detailed description of each step. Extraction of title based tags The resource title is divided into words, which are then cleaned of non-alphabetical and non-numerical characters. The system assigns a score to each word, which represents the probability of being chosen as a tag – number of times being chosen as a tag divided by the number of occurrences. If the word occurred in the titles of previously entered resources less than 100 times its probability of being a correct tag is set to 0.1 which is an empirically estimated value for low-frequency tags. The probability score is introduced to reduce the impact of stopwords. It is important to notice that the standard stopwords list, which is often used in information retrieval systems, is
Algorithm 1: Tag recommendation system Data: a resource Pres and user u Result: a set of recommended tags TRecommendation, a tag consist of a keyword (w)and recomme ore (scor /*Step 1- Ertraction of title based tags*/ Write +-ertractTitlewords(pres Tritle foreach u∈ Write de L TTitle add makeTag(w, get PriorU sefullness(u)) *Step 2- Retrieval of tags related to title*/ foreach t∈ TTitle do Tt Related← Ti RelTags + 0// related tags from Tag-to-Tag lexicon Tt RelTitle +-0// related tags from Title-to-Tag lexicon foreach r E get Related(TagToTag, t)do L TtRelTags add makeTag(r w, t score get Rel Score( foreach r E get Related (lTitleToTag, t)do Tt Reltitle add makeTag(r w, t score* get Rel Score(TitleTotag, t, r)) Tt RelTags Limit Size( TtRelTags, 20) Tt limit size(TtRelTitle, 20) n Prob(Tt RelTags, Tt RelTitle) Related union Prob(tt, related /* Step 3-Personomy based filtering*/ P← get Personomy(u) foreach t∈ RElated do 0// tags retrieved from user's personomy ift∈ P ther foreach r∈Pdo L T RelPersonomy add makeTag(r w, t score* get RelScore(P, t, r)) TRelPersonomy + -union Prob(Tt, RelPersonomy, .. Tin RelPersonomy) TRelPersonomy + normalize Scores(limit Size(TRelPersonomy, 10)) TT itle normalize Scores(Tritle) TRecommendation + -limit size(union Prob(TTitle, TRel not sufficient here, because we have to deal with titles in various languages and stopwords specific for the folksonomy (e. g, word"page"is frequent in web-page titles, but it is rarely used as a tag) Retrieval of tags related to title The most important element of this step is the definition of the lexicon. It can be built based on two types of relations As introduced in section 3.2, the lexicon can be built based on tags attached to the same resource. which are considered as related. The calculation of the factor that represents the relation strength can be solved based on various approaches
Algorithm 1: Tag recommendation system Data: a resource pres and user u Result: a set of recommended tags TRecommendation, a tag consist of a keyword (w) and recommendation score (score) begin /*Step 1 – Extraction of title based tags*/ WT itle ←− extractT itleW ords(pres) TT itle ←− ∅ foreach w ∈ WT itle do TT itle add makeT ag(w, getP riorUsefullness(w)) /*Step 2 – Retrieval of tags related to title*/ foreach t ∈ TT itle do Tt Related ←− ∅ Tt RelT ags ←− ∅// related tags from Tag-to-Tag lexicon Tt RelT itle ←− ∅// related tags from Title-to-Tag lexicon foreach r ∈ getRelated(lT agT oT ag, t) do TtRelT ags add makeT ag(r.w, t.score ∗ getRelScore(lT agT oT ag, t, r)) foreach r ∈ getRelated(lT itleT oT ag, t) do Tt RelT itle add makeT ag(r.w, t.score ∗ getRelScore(lT itleT oT ag, t, r)) Tt RelT ags ←− limitSize(TtRelT ags, 20) Tt RelT itle ←− limitSize(TtRelT itle, 20) Tt Related ←− unionP rob(Tt RelT ags, Tt RelT itle) TRelated ←− unionP rob(Tt1 Related, . . . , Ttn Related) /*Step 3 – Personomy based filtering*/ P ←− getP ersonomy(u) foreach t ∈ TRelated do Tt RelP ersonomy ←− ∅// tags retrieved from user’s personomy if t ∈ P then foreach r ∈ P do Tt RelP ersonomy add makeT ag(r.w, t.score ∗ getRelScore(P, t, r)) TRelP ersonomy ←− unionP rob(Tt1 RelP ersonomy, . . . , Ttn RelP ersonomy) TRelP ersonomy ←− normalizeScores(limitSize(TRelP ersonomy, 10)) TT itle ←− normalizeScores(TT itle) TRecommendation ←− limitSize(unionP rob(TT itle, TRelP ersonomy), 10) end not sufficient here, because we have to deal with titles in various languages and stopwords specific for the folksonomy (e.g., word “page” is frequent in web-page titles, but it is rarely used as a tag). Retrieval of tags related to title The most important element of this step is the definition of the lexicon. It can be built based on two types of relations. As introduced in section 3.2, the lexicon can be built based on tags attached to the same resource, which are considered as related. The calculation of the factor that represents the relation strength can be solved based on various approaches
(e. g, association rule mining). In the presented system the score for a tag ti the number of its co-occurrences with another tag t2 among all resources divided by the total number of occurrences of tag t1. The score is analogous to he confidence score(Eq. 1) in association rule mining 1 confidence(t1, t2)=support(tint2)) support(t1) Considering title words as the source of tags we can think of the second type of the lexicon representing relations between the words extracted from resource title and resource tags. The method of construction is analogical to the previous lexicon, the only difference is that tag ti is drawn from the title not the resource Both lexicons present different perspective of tag relations and give silightly different results(Table 1). The latter approach seems to be more adequate to the nput tags; however, it is biased for general words that are often used in the title or this type of words the related tags given by the second lexicon are simply the most frequently used tags(Table 2). To avoid the need of disambiguation between words more appropriate for either of the lexicons i decided to join the list of related tags produced by both of them(limited to twenty tags). The scores of tags that were present in both lists are summed as they were probabilities of two independent events This step is performed independently for each tag extracted from the title Based on the lexicon the list of related tags with scores defining the strength of relation is retrieved. Finally, the lists are joined. Scores of multiple occurrences of identical tags are summed as they were independent probabilistic events, where the probability is defined by the relation score. Tags related to a word that is not likely to become a tag (e. g,"page") are also not good candidates for recommendation. These are very general terms which are hard to connect with any concept. This is the reason why before joining the relation score is multiplied by the title tag score computed in the previous step Tag-to-Tag lex.Title-to-Tag lex Tag-to-Tag lex. Title-to-Tag lex occurrence: 317 occurrence: 204 occurrence: 53 occurrence: 2439 Scorel Tag Score Tag 1. semantics 1.000 semantics 0.392 1. home 1.000 software 0.08 3.ontology 0.177 semantic 0.313 3. software 0.094 computing 0.064 4. semantic 0.167 folksonomy 0. 215 4. server 0.075 java 0.059 stagging 196 5.photos 0.056 opensource 0.051 able 1. Top 5 tags related to "semantics" Table 2. Top 5 tags related to"home according to two types of lexicon. according to two types of lexicon
(e.g., association rule mining). In the presented system the score for a tag t1 is the number of its co-occurrences with another tag t2 among all resources, divided by the total number of occurrences of tag t1. The score is analogous to the confidence score (Eq. 1) in association rule mining [1]. conf idence(t1, t2) = support({t1 ∩ t2}) support({t1}) (1) Considering title words as the source of tags we can think of the second type of the lexicon representing relations between the words extracted from resource title and resource tags. The method of construction is analogical to the previous lexicon, the only difference is that tag t1 is drawn from the title not the resource tags. Both lexicons present different perspective of tag relations and give silightly different results (Table 1). The latter approach seems to be more adequate to the input tags; however, it is biased for general words that are often used in the title. For this type of words the related tags given by the second lexicon are simply the most frequently used tags (Table 2). To avoid the need of disambiguation between words more appropriate for either of the lexicons I decided to join the list of related tags produced by both of them (limited to twenty tags). The scores of tags that were present in both lists are summed as they were probabilities of two independent events. This step is performed independently for each tag extracted from the title. Based on the lexicon the list of related tags with scores defining the strength of relation is retrieved. Finally, the lists are joined. Scores of multiple occurrences of identical tags are summed as they were independent probabilistic events, where the probability is defined by the relation score. Tags related to a word that is not likely to become a tag (e.g., “page”) are also not good candidates for recommendation. These are very general terms which are hard to connect with any concept. This is the reason why before joining the relation score is multiplied by the title tag score computed in the previous step. Tag-to-Tag lex. Title-to-Tag lex. occurrence: 317 occurrence: 204 Tag Score Tag Score 1. semantics 1.000 semantics 0.392 2. semanticweb 0.306 semanticweb 0.348 3. ontology 0.177 semantic 0.313 4. semantic 0.167 folksonomy 0.215 5. semweb 0.158 tagging 0.196 Table 1. Top 5 tags related to “semantics” according to two types of lexicon. Tag-to-Tag lex. Title-to-Tag lex. occurrence: 53 occurrence: 2439 Tag Score Tag Score 1. home 1.000 software 0.081 2. page 0.113 tools 0.073 3. software 0.094 computing 0.064 4. server 0.075 java 0.059 5. photos 0.056 opensource 0.051 Table 2. Top 5 tags related to “home” according to two types of lexicon
Personomy based filtering The set of tags retrieved in the second step is likely to consist of a lot of correct recommendations. However, low precision caused by the size of the set, makes its usefulness low. The last processing step is usee to filter the tags that are most likely to be chosen by a user. Checking the tags gainst the users personomy allows the system to choose lexical forms preferred by user(e.g-,“ semantics” instead of“ semantic”). In addition, the personam gives access to user specific tags (e. g, "masterthesis"). The retrieval of related tags is done analogously to the lexicon based approach used in the second ster The strength of relation is calculated based on Eq. 1; however, now the se of resources is limited to user's own posts. It is important to notice that th approach gives access not only to tags that are explicitly found in the personomy but also to tags that co-occurred with them in user's posts Subsequently, the scores are multiplied by the relation score of the base tag, which was calculated in the second step. Again the scores are calculated for each base tag separately and then the lists of results are joined, summing scores of multiple occurrences of the same tag in probabilistic way. The list of tags proposed as a recommendation is limited to the ten tags with the highest score As mentioned in section 3.2, the objective of some tags is to describe the resource, not to relate it to users interests. To give the user access to recom mendation of such tags, the system recommends also the tags retrieved from the title in the first step. As scores defined in first and third step are not compara- ble, I decided to normalize the scores in both lists, to make the sum of scores each list equal to one. After normalization the lists are joined, again using the probabilistic sum and limiting the final list to ten tags 5 Evaluation This section presents the results of the off-line system evaluation based on the ailable BibSonomy snapshot. The used evaluation approach assumed that all and only relevant tags were given by the user. Although this method simplifies the problem it is robust and objective. The used quality metrics were recall and precision, commonly used in recommender system evaluations [4] Methodology The commonly used evaluation approach is to keep strict divi- sion between training and testing set. This approach was used by the organizers of the eCML PKDD discovery challenge 2008. It allowed the organizers to keep the list of correct tags in secret during the contest. However, assuming that a user provides all and only relevant tags in a post, tag recommendation becomes a specific problem in which the complete feedback about the quality of recommen dation is entered to the system with each post. In such case, we should consider incremental way of evaluation in which each tested post trains the system with tags provided by the user. The paper presents both evaluation approaches first experiment followed strictly the approach proposed by the organizers of the eCML PKdd discovery challenge 2008-59, 542 newest posts were used test set. In the second experiment, in addition to incremental training, i decide
Personomy based filtering The set of tags retrieved in the second step is likely to consist of a lot of correct recommendations. However, low precision caused by the size of the set, makes its usefulness low. The last processing step is used to filter the tags that are most likely to be chosen by a user. Checking the tags against the user’s personomy allows the system to choose lexical forms preferred by user (e.g., “semantics” instead of “semantic”). In addition, the personomy gives access to user specific tags (e.g., “masterthesis”). The retrieval of related tags is done analogously to the lexicon based approach used in the second step. The strength of relation is calculated based on Eq. 1; however, now the set of resources is limited to user’s own posts. It is important to notice that this approach gives access not only to tags that are explicitly found in the personomy, but also to tags that co-occurred with them in user’s posts. Subsequently, the scores are multiplied by the relation score of the base tag, which was calculated in the second step. Again the scores are calculated for each base tag separately and then the lists of results are joined, summing scores of multiple occurrences of the same tag in probabilistic way. The list of tags proposed as a recommendation is limited to the ten tags with the highest score. As mentioned in section 3.2, the objective of some tags is to describe the resource, not to relate it to user’s interests. To give the user access to recommendation of such tags, the system recommends also the tags retrieved from the title in the first step. As scores defined in first and third step are not comparable, I decided to normalize the scores in both lists, to make the sum of scores in each list equal to one. After normalization the lists are joined, again using the probabilistic sum and limiting the final list to ten tags. 5 Evaluation This section presents the results of the off-line system evaluation based on the available BibSonomy snapshot. The used evaluation approach assumed that all and only relevant tags were given by the user. Although this method simplifies the problem it is robust and objective. The used quality metrics were recall and precision, commonly used in recommender system evaluations [4]. Methodology The commonly used evaluation approach is to keep strict division between training and testing set. This approach was used by the organizers of the ECML PKDD discovery challenge 2008. It allowed the organizers to keep the list of correct tags in secret during the contest. However, assuming that a user provides all and only relevant tags in a post, tag recommendation becomes a specific problem in which the complete feedback about the quality of recommendation is entered to the system with each post. In such case, we should consider incremental way of evaluation in which each tested post trains the system with tags provided by the user. The paper presents both evaluation approaches. The first experiment followed strictly the approach proposed by the organizers of the ECML PKDD discovery challenge 2008 – 59, 542 newest posts were used as test set. In the second experiment, in addition to incremental training, I decided
to reduce the impact of posts imported from an external repository (e. g, web browser), by not testing the system on groups of user's posts with the sam timestamp. This limited the number of test posts to 7, 133. Imported posts have their tags assigned automatically In real use a tag recommender is not used for the imported tags, therefore it should not be tested by them. To give more insights about the system its final recommendation is presented together with tags produced in each of its three steps. The first step, that simply proposes words from the title as tags, can be considered as a baseline system Additional baseline systems presented are the recommenders that proposes most frequent tags from users personomy and resource tags. As each approach returns ranked list of tags it is possible to freely limit the number of recommended tags. The plots(Fig. 4) present consecutive results for the top n tags, where Results The first experiment shows low quality of personomy based recom- mendation, represented by the third step of the system and the baseline system which proposes the most frequent user's tags(Fig 4(a)). This unexpected situa- tion is caused by the evaluation approach used in this experiment (strict division between training and testing set ) Among 59, 542 tested posts only 16, 169(27%) were entered by users who have their previous posts in the training set. For the est of tested posts the personomy based recommenders could not propose any tags. Clearly such large percentage of"first-time"users is not possible in reality thus this evaluation approach seems to underestimate the score of personomy based recommenders. The results are also strongly biased by the choice of test posts. Especially not representative is a single user that is responsible for 65% of all test posts. His/Her posts are likely to be imported from an external repos- itory. The tags in these posts look like being mechanically extracted from the article content, which supports the recall result of the title based recommender (first step of the system). The overall result of the system is therefore completely determined by the tags proposed in the first step that was not meant to be the main element of the system The second evaluation approach, in which tested posts were used to train the system, solves the problem of extraordinarily large number of "first-time users(3% of test posts). For this evaluation method personomy based recom- mender(the third step) outperforms title based solutions(Fig 4(b)). Low and slowly decreasing (with increasing number of recommended tags)precision of the baseline approach shows that most frequent tags from personomy are not necessary a good recommendation. These results confirm previous experiments, which showed that personomy is the richest, but noisiest source of tags. The title so confirmed its usefullness as a source of tags. At some point increasing the number of recommended tags does not improve precision and recall of the first step- the number of words in the title hardly ever reaches 10. The results of the second step are consistent with the first step for the top tags. Tags tend to have high self-relation score which makes title base tags likely to be high in the ranking produced by the second step. The results of the last baseline system
to reduce the impact of posts imported from an external repository (e.g., web browser), by not testing the system on groups of user’s posts with the same timestamp. This limited the number of test posts to 7, 133. Imported posts have their tags assigned automatically. In real use a tag recommender is not used for the imported tags, therefore it should not be tested by them. To give more insights about the system its final recommendation is presented together with tags produced in each of its three steps. The first step, that simply proposes words from the title as tags, can be considered as a baseline system. Additional baseline systems presented are the recommenders that proposes most frequent tags from user’s personomy and resource tags. As each approach returns ranked list of tags it is possible to freely limit the number of recommended tags. The plots (Fig. 4) present consecutive results for the top n tags, where 1 ≤ n ≤ 10. Results The first experiment shows low quality of personomy based recommendation, represented by the third step of the system and the baseline system which proposes the most frequent user’s tags (Fig. 4(a)). This unexpected situation is caused by the evaluation approach used in this experiment (strict division between training and testing set). Among 59, 542 tested posts only 16, 169 (27%) were entered by users who have their previous posts in the training set. For the rest of tested posts the personomy based recommenders could not propose any tags. Clearly such large percentage of “first-time” users is not possible in reality, thus this evaluation approach seems to underestimate the score of personomy based recommenders. The results are also strongly biased by the choice of test posts. Especially not representative is a single user that is responsible for 65% of all test posts. His/Her posts are likely to be imported from an external repository. The tags in these posts look like being mechanically extracted from the article content, which supports the recall result of the title based recommender (first step of the system). The overall result of the system is therefore completely determined by the tags proposed in the first step, that was not meant to be the main element of the system. The second evaluation approach, in which tested posts were used to train the system, solves the problem of extraordinarily large number of “first-time” users (3% of test posts). For this evaluation method personomy based recommender (the third step) outperforms title based solutions (Fig. 4(b)). Low and slowly decreasing (with increasing number of recommended tags) precision of the baseline approach shows that most frequent tags from personomy are not necessary a good recommendation. These results confirm previous experiments, which showed that personomy is the richest, but noisiest source of tags. The title also confirmed its usefullness as a source of tags. At some point increasing the number of recommended tags does not improve precision and recall of the first step – the number of words in the title hardly ever reaches 10. The results of the second step are consistent with the first step for the top tags. Tags tend to have high self-relation score which makes title base tags likely to be high in the ranking produced by the second step. The results of the last baseline system