Ranked Tag recommendation Systems Based on Logistic Regression J. R Quevedo, E Montanes, J. Ranilla, and I. Diaz University of Oviedo i quevedo, montaneselena, ranilla, sirene @uniovies Abstract. This work proposes an approach to tag recommendation based on a logistic regression based system. The goal of the method is to support users of current social network systems by providing a rank of new meaningful tags for a resource. This system provides a ranked tag set and it feeds on different posts depending on the resource for which the user requests the recommendation. The performance of this approach is tested according to several evaluation measures, one of them proposed in this paper(Ft). The experiments show that this learning system outperforms certain benchmark recommenders 1 Introduction Tagging can be defined as the process of assigning short textual descriptions called tags, to information resources, which allow the user to organize the con tent. This becomes very popular and helpful for large-scale systems such as Folksonomies. A Folksonomy 3 is a collection of resources entered by users in posts. Each post consists of a resource and a set of keywords, so-called tags, at tached to it by a user. Generally, the resource is specific to the user who added it to the system, but all users are invited to label it with tags This paper proposes an approach to tag recommendation based on a learning process. The work starts from the hypothesis that a learning process improves the performance of the recommendation task. It explores several information to feed on the learner. It also analyzes the fact that recent posts are more useful and suitable to recommend new tags The remainder of the paper is structured as follows. Section 2 presents back- ground information about tag recommendation in social networks. Our approach is put in context in Section 3 while the proposed method is provided in Section 4. Section 5 details some novel performance evaluation metrics. Finally Section 6 shows the results and Section 7 draws the work conclusions This work was supported by the spanish Ministerio de Educacion y Ciencia and the European Regional Development Fund TIN2007-61273 M. Grana Romay et al.(Eds ) HAIS 2010, Part I, LNAI 6076, pp 237-244, 2010
Ranked Tag Recommendation Systems Based on Logistic Regression J.R. Quevedo, E. Monta˜n´es, J. Ranilla, and I. D´ıaz Computer Science Department University of Oviedo {quevedo,montaneselena,ranilla,sirene}@uniovi.es Abstract. This work proposes an approach to tag recommendation based on a logistic regression based system. The goal of the method is to support users of current social network systems by providing a rank of new meaningful tags for a resource. This system provides a ranked tag set and it feeds on different posts depending on the resource for which the user requests the recommendation. The performance of this approach is tested according to several evaluation measures, one of them proposed in this paper (F + 1 ). The experiments show that this learning system outperforms certain benchmark recommenders. 1 Introduction Tagging can be defined as the process of assigning short textual descriptions, called tags, to information resources, which allow the user to organize the content. This becomes very popular and helpful for large-scale systems such as Folksonomies. A Folksonomy [3] is a collection of resources entered by users in posts. Each post consists of a resource and a set of keywords, so-called tags, attached to it by a user. Generally, the resource is specific to the user who added it to the system, but all users are invited to label it with tags. This paper proposes an approach to tag recommendation based on a learning process. The work starts from the hypothesis that a learning process improves the performance of the recommendation task. It explores several information to feed on the learner. It also analyzes the fact that recent posts are more useful and suitable to recommend new tags. The remainder of the paper is structured as follows. Section 2 presents background information about tag recommendation in social networks. Our approach is put in context in Section 3 while the proposed method is provided in Section 4. Section 5 details some novel performance evaluation metrics. Finally Section 6 shows the results and Section 7 draws the work conclusions. This work was supported by the Spanish Ministerio de Educaci´on y Ciencia and the European Regional Development Fund [TIN2007-61273]. M. Gra˜na Romay et al. (Eds.): HAIS 2010, Part I, LNAI 6076, pp. 237–244, 2010. c Springer-Verlag Berlin Heidelberg 2010
238 J.R. Quevedo et al 2 Related work Different approaches have been proposed to support users during the tag process depending on the purpose for which they were built. Some of makes recommendations by analyzing tag co-occurrences or studies graph Im Jaschke et al. 5 adapt a user-based collaborative filtering as well as a graph- based recommender built on top of FolkRank Basile et al. 1 propose a smart TRS able to learn from past user interaction as well as the content of the re- sources to annotate. Katakis et al. [6 model the automated tag suggestion prob- lem as a multi-label text classification task. Sigurbjornsson et al. [8 present the res ults by means of a tag characterization focusing on how users tags photos of Flickr and what information is contained in the tagging Most of these systems require information associated with the content of the resource itself 1. Others simply suggest a sets of tags as a consequence of classification rather than providing a ranking of them [6. Some of them require a large quantitative of supporting data [8. The proposal of this work avoids these drawbacks through a novel approach which establishes a tag ranking by a machine learning approach based on Logistic Regression 3 Tag Recommender Systems(tRs A folksonomy is a tuple F: =(l, T, R, ]) wherell, T and R are finite sets, whose elements are respectively called users, tags and resources, and y is a ternary relation between them, i. e, yCuxTxR, whose elements are tag assignments (posts). When a user adds a new or existing resource to a folksonomy, it could be helpful to recommend him/her relevant tags TRS usually take the users, resources and the ratings of tags into account to suggest a list of tags to the user. According to 7 a TRS can briefly be formulated as a system that takes as input a given user u E u and a resource r ER and produces a set T(u,r)CT of tags as output. Jaschke et al in 5 define a pos of a folksonomy as a user, a resource and all tags that this user has assigned to that resource. This work slightly modifies this definition in the sense that it restricts the set of tags to the tags used simultaneously to tag a resource by a 2 There exist some simple but frequently used TRS [5] based on providing a list of ranked tags extracted from the set of posts connected with the current MPT(Most Popular Tags ) For each tag ti, the posts with ti are counted and the top tags (ranked by occurrence count) are utilized as recommendations st often together with ri are then proposed as recommendations MPTU (Most Popular Tags by User): For a resource ri it is counted for each tag in how many posts they occur together with Ti. In addition, for a
238 J.R. Quevedo et al. 2 Related Work Different approaches have been proposed to support users during the tagging process depending on the purpose for which they were built. Some of them makes recommendations by analyzing tag co-occurrences or studies graph-based approaches. J¨aschke et al. [5] adapt a user-based collaborative filtering as well as a graphbased recommender built on top of FolkRank. Basile et al. [1] propose a smart TRS able to learn from past user interaction as well as the content of the resources to annotate. Katakis et al. [6] model the automated tag suggestion problem as a multi-label text classification task. Sigurbjornsson et al. [8] present the results by means of a tag characterization focusing on how users tags photos of Flickr and what information is contained in the tagging. Most of these systems require information associated with the content of the resource itself [1]. Others simply suggest a sets of tags as a consequence of a classification rather than providing a ranking of them [6]. Some of them require a large quantitative of supporting data [8]. The proposal of this work avoids these drawbacks through a novel approach which establishes a tag ranking by a machine learning approach based on Logistic Regression. 3 Tag Recommender Systems (TRS) A folksonomy is a tuple F := (U, T , R, Y) where U, T and R are finite sets, whose elements are respectively called users, tags and resources, and Y is a ternary relation between them, i. e., Y ⊆ U×T ×R, whose elements are tag assignments (posts). When a user adds a new or existing resource to a folksonomy, it could be helpful to recommend him/her relevant tags. TRS usually take the users, resources and the ratings of tags into account to suggest a list of tags to the user. According to [7] a TRS can briefly be formulated as a system that takes as input a given user u ∈ U and a resource r ∈ R and produces a set T (u, r) ⊂ T of tags as output. J¨aschke et al. in [5] define a post of a folksonomy as a user, a resource and all tags that this user has assigned to that resource. This work slightly modifies this definition in the sense that it restricts the set of tags to the tags used simultaneously to tag a resource by a user. There exist some simple but frequently used TRS [5] based on providing a list of ranked tags extracted from the set of posts connected with the current annotation. – MPT (Most Popular Tags): For each tag ti, the posts with ti are counted and the top tags (ranked by occurrence count) are utilized as recommendations. – MPTR (Most Popular Tags by Resource): For a resource ri it is counted for every tag in every posts they occur together with ri. The tags occurring most often together with ri are then proposed as recommendations. – MPTU (Most Popular Tags by User): For a resource ri it is counted for each tag in how many posts they occur together with ri. In addition, for a
Ranked Tag Recommendation Systems Based on Logistic Regression 239 user ui it is counted for every tag in every posts they occur together with li. The tags occurring most often together with either ri or ui are taken as recommendations MPTRU (Most Popular Tags by Resource or User ) For a resource ri it is counted for each tag in how many posts they occur together with ri. In addition, for a user ui it is counted for every tag in how many posts they occur together with ui. The tags occurring most often together with eithe Ti or ui are taken as The introduction of a learning system is expected to improve the performance of the 4 Learning to Recommend This section depicts the whole procedure followed in order to provide a user and a resource with a set of ranked tags. These recommendations are based on a learning process that learns how the users has previously tagged resources. The core of the method is a supervised learning algorithm based on logistic regression 2. This paper studies different training sets built according to the user and resource for whom the recommendations are provided, rding to The key points of the system are the following The training set depends on each test set and it is built specifically for each f them Several training sets are built according to different criteria and afterwards compared and evaluated The learning system adopted was LIBLINEAR 2, which provides a proba- bilistic distribution before the classification. This probability distribution is exerted to rank the tags, taking as most suitable tag the one with highest probability value he tags of the ranking will be all that appear in the categories of each training set. This entails that some positive tags of a test post might not be 4.1 Definition of the Test Set The traditional approach splits the data into training and test sets at the begin- fterwards. a model is inferred aining set and it is validated thanks to the test set [6. In this paper, the methodology used is quite different in the sense that the training and test sets are not fixed. The test set is randomly elected and afterwards an ad hoc training set is provided to each test post According to the definition of a folksonomy in Section 3, it is composed by a set of posts. Each post is formed by a user, a resource and a set of tags, i.e P1=(u,r;,{t1…,t1})
Ranked Tag Recommendation Systems Based on Logistic Regression 239 user ui it is counted for every tag in every posts they occur together with ui. The tags occurring most often together with either ri or ui are taken as recommendations. – MPTRU (Most Popular Tags by Resource or User): For a resource ri it is counted for each tag in how many posts they occur together with ri. In addition, for a user ui it is counted for every tag in how many posts they occur together with ui. The tags occurring most often together with either ri or ui are taken as recommendations. The introduction of a learning system is expected to improve the performance of them. 4 Learning to Recommend This section depicts the whole procedure followed in order to provide a user and a resource with a set of ranked tags. These recommendations are based on a learning process that learns how the users has previously tagged resources. The core of the method is a supervised learning algorithm based on logistic regression [2]. This paper studies different training sets built according to the user and resource for whom the recommendations are provided. The key points of the system are the following – The training set depends on each test set and it is built specifically for each of them. – Several training sets are built according to different criteria and afterwards compared and evaluated. – The learning system adopted was LIBLINEAR [2], which provides a probabilistic distribution before the classification. This probability distribution is exerted to rank the tags, taking as most suitable tag the one with highest probability value. – The tags of the ranking will be all that appear in the categories of each training set. This entails that some positive tags of a test post might not be ranked. 4.1 Definition of the Test Set The traditional approach splits the data into training and test sets at the beginning. Afterwards, a model is inferred using the training set and it is validated thanks to the test set [6]. In this paper, the methodology used is quite different in the sense that the training and test sets are not fixed. The test set is randomly selected and afterwards an ad hoc training set is provided to each test post. According to the definition of a folksonomy in Section 3, it is composed by a set of posts. Each post is formed by a user, a resource and a set of tags, i.e., pi = (ui, ri, {ti1 ,...,tik })
40 J. R. Quevedo et al Each post of a folksonomy is candidate to become a test post. Each test post is then turned into as many examples as tags used to label the resource. Therefore post pi is split into k test examples (v;,r,t1)…k=(u4,r1,t) 4.2 Definition of the Training Set Whichever learning system strongly depends on the training set used to learn posts posted before the test post. This parameter N is experimentally ired or This work dynamically selects an ad hoc training set from the N most recent This characteristic makes the tRS to suggest the on-fashion folksonomy tags nd it produces a more scalable system, since the number of posts in the training set does not increase according to the number of posts posted before the test post This characteristic makes solving the problem by the learning system feasible Therefore, the choice of the training set for a given test post is reduced to define the criteria the posts must satisfy to be included in the training set. This work studies different criteria Approach 1. TR Let Pi=(ui, Ti, ti, .. ti ] ) be a test post. Let Ru, be the subset of posts associated to a resource ri and Rr.=pi/pi E Ri and it was posted before th TR approach selects as training set the N most modern posts of Rri, being di the date when pi was posted. Approach 2. TU Let pi=(ui, Ti, tin,..., tiny) be a test post. Let Pu be the personomy(the subset of posts posted by a user constitutes the so-called per Pu,=pi/pi E Pu, and it was posted before t] TU approach selects as training set the N most modern posts of pd, being di the date when Pi was posted Approach 3. TRU The above training sets do not take into account that the learned model which is used after a resource is presented to the user. Hence, this approach proposes to go further and to add as training examples those concerning with the resource for which the recommendations are demanded. The examples whose tags have been previously assigned to the resource by the user to whom the recommendations are provided are removed because it has no sense to recommend a user the tags he had previously used to label the resource Therefore, the training set associated to pi is formed by URd r,=(pd urpi/p;=(ui, Ti, ( t1, .,tnD)1
240 J.R. Quevedo et al. Each post of a folksonomy is candidate to become a test post. Each test post is then turned into as many examples as tags used to label the resource. Therefore, post pi is split into k test examples e1 = (ui, ri, ti1 )...ek = (ui, ri, tik ) (1) 4.2 Definition of the Training Set Whichever learning system strongly depends on the training set used to learn. This work dynamically selects an ad hoc training set from the N most recent posts posted before the test post. This parameter N is experimentally fixed. This characteristic makes the TRS to suggest the on-fashion folksonomy tags and it produces a more scalable system, since the number of posts in the training set does not increase according to the number of posts posted before the test post. This characteristic makes solving the problem by the learning system feasible. Therefore, the choice of the training set for a given test post is reduced to define the criteria the posts must satisfy to be included in the training set. This work studies different criteria. Approach 1. TR Let pi = (ui, ri, {ti1 ,...,tik }) be a test post. Let Rui be the subset of posts associated to a resource ri and Rt ri = {pi/pi ∈ Ri and it was posted before t}. TR approach selects as training set the N most modern posts of Rdi ri , being di the date when pi was posted. Approach 2. TU Let pi = (ui, ri, {ti1 ,...,tik }) be a test post. Let Pui be the personomy (the subset of posts posted by a user constitutes the so-called personomy) associated to a user ui and Pt ui = {pi/pi ∈ Pui and it was posted before t} TU approach selects as training set the N most modern posts of Pdi ui , being di the date when pi was posted. Approach 3. TRU The above training sets do not take into account that the learned model which is used after a resource is presented to the user. Hence, this approach proposes to go further and to add as training examples those concerning with the resource for which the recommendations are demanded. The examples whose tags have been previously assigned to the resource by the user to whom the recommendations are provided are removed because it has no sense to recommend a user the tags he had previously used to label the resource. Therefore, the training set associated to pi is formed by URdi ui,ri = {Pd i ∪ Rd i }\{pj/pj = (ui, ri, {t1, ..., tn})}
Ranked Tag Recommendation Systems Based on Logistic Regression 4.3 Example Representation Once both the training and test sets are defined, it is necessary to transform them into a computable form understandable for a machine learning system. Therefore, we have to define the features which characterize the examples as well as the class of each example The features which characterize the examples are the tags previously used o tag the resource in the folksonomy. Hence, each example will be represented by a boolean vector V of size M (the number of tags of the folksonomy) where if and only if t, was used to tag the resource before and 0 otherwise, where jEl,.,M. The class of a example will be the tag with which the user has tagged the resource in this moment In addition, removing redundant or non-useful features which add noise to the system is usually helpful to increase both the effectiveness and efficiency of the classifiers. The example representation based on tags as features makes possible perform a simple feature selection in the training set consisting of just keeping those tags which represent the test(this approach will be called TRUTR) 4.4 Learning System The key point of this paper is to provide a ranked set of tags adapted to a user and a resource. Therefore, it could be beneficial a learning system able to rank the tags, indicating to the user which tag is the best and which one is the worst In this framework, LIBLINEAR [2] is an open source library based on SVM which is a recent alternative able to accomplish multi-category classification through logistic regression, providing a probabilistic distribution before the clas- sification. This probability distribution is exerted to rank the tags, taking as most suitable tag the one with highest probability value. In the same sense the most discordant tag will be the one with lowest probability. This work uses the default LIBLINEAR configuration after a slight modifi- ation of the output. The evaluation in this case takes place when a resource is presented to the user. Pe erformance evaluation So far, no consensus about an adequate metric to evaluate a recommender is stated 5. In(6)a random post for each user is picked up and afterwards they provide a set of tags for this post based on the whole folksonomy except such post. Then, they compute the Fi as T+(u, r)nT(u, r) D (u n IT(u, r)l+T+(u,r) Availableathttp://www.csie.ntu.edu.tw/cjlin/liblinear/
Ranked Tag Recommendation Systems Based on Logistic Regression 241 4.3 Example Representation Once both the training and test sets are defined, it is necessary to transform them into a computable form understandable for a machine learning system. Therefore, we have to define the features which characterize the examples as well as the class of each example. The features which characterize the examples are the tags previously used to tag the resource in the folksonomy. Hence, each example will be represented by a boolean vector V of size M (the number of tags of the folksonomy) where vj = 1 if and only if tj was used to tag the resource before and 0 otherwise, where j ∈ 1,...,M. The class of a example will be the tag with which the user has tagged the resource in this moment. In addition, removing redundant or non-useful features which add noise to the system is usually helpful to increase both the effectiveness and efficiency of the classifiers. The example representation based on tags as features makes possible perform a simple feature selection in the training set consisting of just keeping those tags which represent the test (this approach will be called TRUTR). 4.4 Learning System The key point of this paper is to provide a ranked set of tags adapted to a user and a resource. Therefore, it could be beneficial a learning system able to rank the tags, indicating to the user which tag is the best and which one is the worst for the resource. In this framework, LIBLINEAR [2] is an open source library1 based on SVM which is a recent alternative able to accomplish multi-category classification through logistic regression, providing a probabilistic distribution before the classification. This probability distribution is exerted to rank the tags, taking as most suitable tag the one with highest probability value. In the same sense the most discordant tag will be the one with lowest probability. This work uses the default LIBLINEAR configuration after a slight modifi- cation of the output. The evaluation in this case takes place when a resource is presented to the user. 5 Performance Evaluation So far, no consensus about an adequate metric to evaluate a recommender is stated [5]. In ([6]) a random post for each user is picked up and afterwards they provide a set of tags for this post based on the whole folksonomy except such post. Then, they compute the F1 as F1 = 1 |D| (u,r)∈D 2|T +(u, r) ∩ T (u, r)| |T (u, r)| + |T +(u, r)| (2) 1 Available at http://www.csie.ntu.edu.tw/∼cjlin/liblinear/
42 J. R. Quevedo et al where D is the test set, T+(u, r)are the set of tags user u has assigned to resource r(positive tags) and T(u, r) are the set of tags the system has recommended to user u to assign resource r. The main drawback of this process of evaluation is that it just evaluates the performance of a classification rather that the performance of a ranking But a TRS able to return the positive tags at the top of the ranking is obviously preferred than one that returns the positive tags at the bottom of the ranking Hence, defining an evaluation metric able to quantify both the tags a TRs rec- ommends and the order in which it ranks them is an expected challenge to cope with. The Area Under the ROC Curve(AUC)or Average Precision(AP)are two a priori measures to evaluate a ranking. The Normalized Discounting Cumu- lative Gain(NDCG)[4 is another evaluation measure for Information Retrieval IR) systems that compute the cumulative gain a user obtains by examining the retrieval result up to a given ranked position. But, Both of them present some drawbacks in tag recommendation because the rankings to compare could not have the same number of tags and it is possible that some tags of the test never appear in the training. In fact, all of them are thought to compare permutations of a predefined set of tags This paper proposes an alternative which tries to overcome those drawbacks It consists of computing the F1 measure for all possible cutoffs of the rankin for which a positive tag is returned and to choose the highest one. It will be denoted by Ft and it is defined by Ft = maxo<=ik=r(Fi)where r is the size of the ranking, that is, the number of tags returned by the system, (F1)i is the Fi of the classification assuming that the system has classified the first i tags as positive ones and the rest as negative ones. Notice that i ranges from 0(which means that the system has not returned any tag as positive)to r(which means that the system has returned all the tags as positive) Since a negative tag in the ranking does not lead to an improvement of Fl, only computing Fl for the cutoffs where a positive tag is placed is required Hence, this metric gives an optimal position of the ranking. Notice that it does not vary if negative tags are added to the tail of the ranking, as it happens to AP. But, it takes into account all positive tags and not just the positive tags hich appear on the top of the ranking, as aUC also does 6 Experiment The experiments were carried out over the collection bto8, which is a dataset formed by bibtex posts extracted from ECML PKDD Dicovery Challenge 2008- It is extracted from Bibsonomy, a web-based social bookmarking and publication-sharing system that enables users to tag web documents as well as bibtex entries of scientific publications http://www.kde.cs.uni-kassel.de/ws/rsdc08/ // org
242 J.R. Quevedo et al. where D is the test set, T +(u, r) are the set of tags user u has assigned to resource r (positive tags) and T (u, r) are the set of tags the system has recommended to user u to assign resource r. The main drawback of this process of evaluation is that it just evaluates the performance of a classification rather that the performance of a ranking But, a TRS able to return the positive tags at the top of the ranking is obviously preferred than one that returns the positive tags at the bottom of the ranking. Hence, defining an evaluation metric able to quantify both the tags a TRS recommends and the order in which it ranks them is an expected challenge to cope with. The Area Under the ROC Curve (AUC) or Average Precision (AP) are two a priori measures to evaluate a ranking. The Normalized Discounting Cumulative Gain (NDCG) [4] is another evaluation measure for Information Retrieval (IR) systems that compute the cumulative gain a user obtains by examining the retrieval result up to a given ranked position. But, Both of them present some drawbacks in tag recommendation because the rankings to compare could not have the same number of tags and it is possible that some tags of the test never appear in the training. In fact, all of them are thought to compare permutations of a predefined set of tags. This paper proposes an alternative which tries to overcome those drawbacks. It consists of computing the F1 measure for all possible cutoffs of the ranking for which a positive tag is returned and to choose the highest one. It will be denoted by F + 1 and it is defined by F + 1 = max0<=i<=r(F1)i where r is the size of the ranking, that is, the number of tags returned by the system, (F1)i is the F1 of the classification assuming that the system has classified the first i tags as positive ones and the rest as negative ones. Notice that i ranges from 0 (which means that the system has not returned any tag as positive) to r (which means that the system has returned all the tags as positive). Since a negative tag in the ranking does not lead to an improvement of the F1, only computing F + 1 for the cutoffs where a positive tag is placed is required. Hence, this metric gives an optimal position of the ranking. Notice that it does not vary if negative tags are added to the tail of the ranking, as it happens to AP. But, it takes into account all positive tags and not just the positive tags which appear on the top of the ranking, as AUC also does. 6 Experiments The experiments were carried out over the collection bt08, which is a dataset formed by bibtex posts extracted from ECML PKDD Dicovery Challenge 20082. It is extracted from Bibsonomy3, a web-based social bookmarking and publication-sharing system that enables users to tag web documents as well as bibtex entries of scientific publications. 2 http://www.kde.cs.uni-kassel.de/ws/rsdc08/ 3 http://www.bibsonomy.org
Ranked Tag Recommendation Systems Based on Logistic Regression 243 Table 1. Performance of the benchmark TRSs for bt08 dataset F AUC AP NDCG F AUC AP NDCG T6.7%58%4.6%66%T37.2%56.3%28.8%42.5% 业TR7.8%5.2%5.7%77% 业TRU38.2%57%%297%43.7% H- m+ mi-e Fig. 1. Ft, AUC, AP and NDCG of btos data set Before using the data sets, tag cleaning was made according to PKDD Dis- covery Challenge 2008. The number of users, tags, resources and posts of this collection is respectively 1, 206, 29, 739, 96, 616 and 278, 008 To test the methods, 1000 test posts were randomly selected. For each one everal ranked tag sets are provided depending on the approach that builds the training set(which leads to four possible TRS: TR, TU, TRU and TRUTR) and the cardinality(N) of the training set (N=i* 500 with i= 1, 2,, 10) Table 1 shows the behavior of the benchmark TrSs according to the evalua tion metrics detailed in 5. As mPtru seems to be better than the rest benchmark TRSs. it is considered as reference from now on Figure 1 respectively shows the Fit, the AUC, the AP and the NDCG eval uation measures for the dataset and the four ways of building the training set Clearly, the system performance is considerably increased when the posts that the user of the test post has posted are included as training examples. Besides adding to the training set the posts in which the resource of the test post is nvolved also helps to improve the effectiveness of the recommender. Further more, if only the tags that represent the test are kept to represent the training
Ranked Tag Recommendation Systems Based on Logistic Regression 243 Table 1. Performance of the benchmark TRSs for bt08 dataset F + 1 AUC AP NDCG F + 1 AUC AP NDCG MPT 6.7% 5.8% 4.6% 6.6% MPTU 37.2% 56.3% 28.8% 42.5% MPTR 7.8% 5.2% 5.7% 7.7% MPTRU 38.2% 57.9% 29.7% 43.7% 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 F1 TR TU TRU TRUTR 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 AUC TR TU TRU TRUTR 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 AP TR TU TRU TRUTR 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 NDCG TR TU TRU TRUTR Fig. 1. F + 1 , AUC, AP and NDCG of bt08 data set Before using the data sets, tag cleaning was made according to PKDD Discovery Challenge 2008. The number of users, tags, resources and posts of this collection is respectively 1, 206, 29, 739, 96, 616 and 278, 008. To test the methods, 1000 test posts were randomly selected. For each one, several ranked tag sets are provided depending on the approach that builds the training set (which leads to four possible TRS: TR, TU, TRU and TRUTR) and the cardinality (N) of the training set (N = i ∗ 500 with i = 1, 2,..., 10). Table 1 shows the behavior of the benchmark TRSs according to the evaluation metrics detailed in 5. As MPTRU seems to be better than the rest benchmark TRSs, it is considered as reference from now on. Figure 1 respectively shows the F + 1 , the AUC, the AP and the NDCG evaluation measures for the dataset and the four ways of building the training set. Clearly, the system performance is considerably increased when the posts that the user of the test post has posted are included as training examples. Besides, adding to the training set the posts in which the resource of the test post is involved also helps to improve the effectiveness of the recommender. Furthermore, if only the tags that represent the test are kept to represent the training
44 J. R. Quevedo et al set formed by the posts with either the resource of the user of the test post the performance in enhanced. These results were statistically confirmed with paired-samples Wilcoxon test 7 Conclusions and Future Work This work proposes a TRS based on a novel approach which learns to rank tags from previous posts in a folksonomy using a logistic regression based system. In addition, a new evaluation measure is proposed, namely FIt which overcomes the drawbacks of other ranking evaluation metrics as AUC, AP and NDCG The TRSs proposed are compared to the best benchmark TRS(MPTRU). The sults show a significant improvement of all the TRSs with regard to MPTRU being that with take into account test representation(TRUT) the best of the four versions. In addition, it is possible to keep the performance without makin the learning process slow down so much, since it is not necessary to add great amount of training examples. Therefore, the introduction of a learning system becomes beneficial for recommending tags. Since example and feature selection improves the performance of a learning based TRS, it would be interesting to explore new approaches in that direction as future work References 1. Basile, P, Gendarmi, D, Lanubile, F, Semeraro, G. Recommending smart tags in a social bookmarking system. In: Bridging the Gep between Semantic Web and Web 20( SemNet2007),pp.22-29(2007) Keerthi, S.S., Lin, C.J., Weng, R C: Trust region newton method for logistic re gression. Journal of Machine Learning Research(9), 627-650(2008) lotho, A, Jaschke, R, Schmitz, C, Stumme, G. Trend detection in folksonomies. pp.56-70(2006) 4. Jarvelin, K, Kekalainen, J. Cumulated gain-based evaluation of ir techniques. ACM Trans.Inf.Syst.20(4),422-446(2002) 5. Jaschke, R, Marinho, L, Hotho, A, Schmidt-Thieme, L, Stumme, G. Tag rec- ommendations in folksonomies. In: Hinneburg, A(ed )Proceedings of LWA 2007, 6. Katakis. I. Tsoumakas G. Vlahavas I Multilabel text classification for automated tag suggestion. In: Proceedings of ECML PKDD Discovery Challenge(RSDC 2008) pp.7583(2008) 7. Marinho, L, Schmidt-Thieme, L. Collaborative tag recommendations. In: Studies in Classification, Data Analysis, and Knowledge Organization, pp. 533-540. Springer heidelberg(2008) 8. Sigurbjornsson, B. van Zwol, R. Flickr tag recommendation based on collective knowledge. In: Proceedings of Www 2008, pp 327-336. ACM, New York(2008)
244 J.R. Quevedo et al. set formed by the posts with either the resource of the user of the test post, the performance in enhanced. These results were statistically confirmed with a paired-samples Wilcoxon test. 7 Conclusions and Future Work This work proposes a TRS based on a novel approach which learns to rank tags from previous posts in a folksonomy using a logistic regression based system. In addition, a new evaluation measure is proposed, namely F + 1 which overcomes the drawbacks of other ranking evaluation metrics as AUC, AP and NDCG. The TRSs proposed are compared to the best benchmark TRS (MPTRU). The results show a significant improvement of all the TRSs with regard to MPTRU, being that with take into account test representation (TRUTR) the best of the four versions. In addition, it is possible to keep the performance without making the learning process slow down so much, since it is not necessary to add great amount of training examples. Therefore, the introduction of a learning system becomes beneficial for recommending tags. Since example and feature selection improves the performance of a learning based TRS, it would be interesting to explore new approaches in that direction as future work. References 1. Basile, P., Gendarmi, D., Lanubile, F., Semeraro, G.: Recommending smart tags in a social bookmarking system. In: Bridging the Gep between Semantic Web and Web 2.0 (SemNet 2007), pp. 22–29 (2007) 2. Keerthi, S.S., Lin, C.J., Weng, R.C.: Trust region newton method for logistic regression. Journal of Machine Learning Research (9), 627–650 (2008) 3. Hotho, A., J¨aschke, R., Schmitz, C., Stumme, G.: Trend detection in folksonomies, pp. 56–70 (2006) 4. J¨arvelin, K., Kek¨al¨ainen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002) 5. J¨aschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G.: Tag recommendations in folksonomies. In: Hinneburg, A. (ed.) Proceedings of LWA 2007, September 2007, pp. 13–20 (2007) 6. Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of ECML PKDD Discovery Challenge (RSDC 2008), pp. 75–83 (2008) 7. Marinho, L., Schmidt-Thieme, L.: Collaborative tag recommendations. In: Studies in Classification, Data Analysis, and Knowledge Organization, pp. 533–540. Springer, Heidelberg (2008) 8. Sigurbj¨ornsson, B., van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceedings of WWW 2008, pp. 327–336. ACM, New York (2008)