Hs|2010 Rzeszow, Poland, May 13-15, 2010 Optimization of Tag Recommender Systems in a Real life setting Szymon Chojnacki, Dariusz Czerski, Mieczyslaw Klopotek Institute of Computer Science PAS, Warsaw, Poland sch(@ipipan wawpl bstract. The purpos nis article is to describe the IL. DISCOVERY CHALLENGE performance related e ores f our tag recommending webservice. The service is one of several systems currently The Discovery Challenge Workshop was a battlefield integrated with multiplexing recommender framework for for different approaches in tag recommending during last Bibsonomy.org bookmarking portal. The framework was two ECML/PKDD conferences. The task can be perceived troduced during ECML PKDD Discovery Challenge 2009. as a multilabel classification and it is interesting that the Due to good balance between speed and quality of our state of art results in terms of fI-measure(blended recommendations we managed to outperform other systems precision and recall) were obtained by classification according to one of three most important evaluation algorithms representing diverse techniques l2(e.g. SVM measures i.e. the proportion of correctly recommended tags logistic hat have been clicked by a user. ion,collaborative filtering, probabilistic ranking. discriminative clustering or instance based lksonomv, Multilabel Classification. Social reasoning [1]). We assert that the final outcome of a Networks, Tag Recommendation. system relied heavily on an ability to create variables combining natural language processing and graph theory During this years challenge the most successful INTRODUCTION recommenders were invited to participate in an online HE development of social networking systems and evaluation. Contrary to the offline tasks the teams were Web2.0 technologies that we observe recently reveals supposed to deliver recommendations within fixed 1000 new challenges for the machine learning community. The milliseconds time constraint for each query. It turned out four paradigms of collaborative society (being open, that most teams found this condition hard to meet. Our peering, sharing and acting globally) are clearly visible on initial approach was at the fifth position during five week the level of user-to-user interaction. but also result in new period of online evaluation. In the second ongoing post types of data and evaluation criteria for our algorithms. contest iteration we decided to simplify and optimize the We can easily obtain real life datasets as complex, useful most accurate algorithm [5] and to focus on a criterion that and interesting as those in corporate warehouses and at the was unique to the online task -i.e. maximization of same time we are entitled to reveal all the details of our proportion of tags chosen by a user that were clicked. At analysis and compare it to other approaches the end of december 2009 our web servi In this article we focus on one of the milestones of this outperforming other recommenders according to phenomenon i.e. social bookmarking and tagging. The best measure, which indicates that we obtained satisfactory recognized examples of web pages offering this tradeoff between accuracy and latency. In this article we functionality are Delicious, CiteULike, BibSonomy, Digg, describe the factors that in our opinion enabled us to Reddit, Ma. gnolia or Simpy. The action of bookmarking achieve good results in such a competitive environment and attaching tags or labels to digital resources(websites In the next section we describe the structure of the data publications, images, music, videos, news and blogs) used during the challenge and basic statistics. In the fourth sheds new light on the problems of Information Retrieval. section we focus on evaluation measures and a set of rules Tagging enables us to instantiate meta knowledge about a embedded into multiplexing recommender. The fifth resource and extend expressive power of searching section is dedicated to the details of an implementation of engines. The main advantage of collaborative tagging over our system. In the last section we compare the results of napping resources onto an expert defined ontology is a the systems taking part in the evaluation exibility of the former. Unfortunately, this flexibility constitutes also a significant drawback, because I. dataset uncontrolled tagging results in an explosion of The dataset used to train models consisted of three files a vocabulary and difficulties in maintenance and reusage. The files contained 1 401 104 tag assignments, 263 004 Therefore, much effort has been put to develop and bookmark posts and 158 924 BibTeX posts. This data was compare tag recommender systems in recent years. a full snapshot of BibSonomy. org users'activity until the end of 31-12-2008. A print screen with an example of tag assignment process is presented in Fig. 1. In this 978-1-4244-7562-9/10/s2600@2010|EEE
+6, 5]HV]RZ3RODQG0D\ Abstract. The purpose of this article is to describe the performance related features of our tag recommending webservice. The service is one of several systems currently integrated with multiplexing recommender framework for Bibsonomy.org bookmarking portal. The framework was introduced during ECML PKDD Discovery Challenge 2009. Due to good balance between speed and quality of our recommendations we managed to outperform other systems according to one of three most important evaluation measures i.e. the proportion of correctly recommended tags that have been clicked by a user. Keywords: Folksonomy, Multilabel Classification, Social Networks, Tag Recommendation. I. INTRODUCTION HE development of social networking systems and Web2.0 technologies that we observe recently reveals new challenges for the machine learning community. The four paradigms of collaborative society (being open, peering, sharing and acting globally) are clearly visible on the level of user-to-user interaction, but also result in new types of data and evaluation criteria for our algorithms. We can easily obtain real life datasets as complex, useful and interesting as those in corporate warehouses and at the same time we are entitled to reveal all the details of our analysis and compare it to other approaches. In this article we focus on one of the milestones of this phenomenon i.e. social bookmarking and tagging. The best recognized examples of web pages offering this functionality are Delicious, CiteULike, BibSonomy, Digg, Reddit, Ma.gnolia or Simpy. The action of bookmarking and attaching tags or labels to digital resources (websites, publications, images, music, videos, news and blogs) sheds new light on the problems of Information Retrieval. Tagging enables us to instantiate meta knowledge about a resource and extend expressive power of searching engines. The main advantage of collaborative tagging over mapping resources onto an expert defined ontology is a flexibility of the former. Unfortunately, this flexibility constitutes also a significant drawback, because uncontrolled tagging results in an explosion of a vocabulary and difficulties in maintenance and reusage. Therefore, much effort has been put to develop and compare tag recommender systems in recent years. II. DISCOVERY CHALLENGE The Discovery Challenge Workshop was a battlefield for different approaches in tag recommending during last two ECML/PKDD conferences. The task can be perceived as a multilabel classification and it is interesting that the state of art results in terms of f1-measure (blended precision and recall) were obtained by classification algorithms representing diverse techniques [2] (e.g. SVM, logistic regression, collaborative filtering, probabilistic ranking, discriminative clustering or instance based reasoning [1]). We assert that the final outcome of a system relied heavily on an ability to create variables combining natural language processing and graph theory. During this year’s challenge the most successful recommenders were invited to participate in an online evaluation. Contrary to the offline tasks the teams were supposed to deliver recommendations within fixed 1 000 milliseconds time constraint for each query. It turned out that most teams found this condition hard to meet. Our initial approach was at the fifth position during five week period of online evaluation. In the second ongoing postcontest iteration we decided to simplify and optimize the most accurate algorithm [5] and to focus on a criterion that was unique to the online task – i.e. maximization of proportion of tags chosen by a user that were clicked. At the end of December 2009 our web service was outperforming other recommenders according to this measure, which indicates that we obtained satisfactory tradeoff between accuracy and latency. In this article we describe the factors that in our opinion enabled us to achieve good results in such a competitive environment. In the next section we describe the structure of the data used during the challenge and basic statistics. In the fourth section we focus on evaluation measures and a set of rules embedded into multiplexing recommender. The fifth section is dedicated to the details of an implementation of our system. In the last section we compare the results of the systems taking part in the evaluation. III. DATASET The dataset used to train models consisted of three files. The files contained 1 401 104 tag assignments, 263 004 bookmark posts and 158 924 BibTeX posts. This data was a full snapshot of BibSonomy.org users’ activity until the end of 31-12-2008. A print screen with an example of tag assignment process is presented in Fig. 1. In this example Optimization of Tag Recommender Systems in a Real Life Setting Szymon Chojnacki, Dariusz Czerski, Mieczysław Kłopotek Institute of Computer Science PAS, Warsaw, Poland sch@ipipan.waw.pl T 107 978-1-4244-7562-9/10/$26.00 ©2010 IEEE
the website of RecSys conference is labeled with one tag recall(T(u,r)) Tur nT(u, r)l r isOnomy portal enables to bookmark also publications, which can be described by several additional fields(e.g. volume, chapter, edition, institution, publisher, precision (T(u,r)) ITur n T(u, r) IT(u,r) journal, pages, author ) The total number of distinct user, URLS, Bib TeXs and tags amounts to 3 617. 235 328 143 050 and 756 respectively f1m=2' precisionrecall precision +recall TmryBbSonomy post bookmark post pusic goups This constrain is motivated by the need to smoothly Feel free to edit your bookmark general information display recommendations to a user. The tags displayed to http://recsys.欲morg the user are selected randomly from one system that met the time constraint. If a recommender delivered tags within I second, but was not chosen for a display, its tags are also compared to the user's final assignment. The architecture of the online framework is drawn in Fig. 2 The weakness of this evaluation approach is a fact that a recommendation conference 2007 2009 rs systems commender may meet the time constraint and correctly gs of coped lem systems 2008 acm pc recommender conference predict user's selection of tags, but a user did not want to Fig. I BibSonomy's interface containing resources wait e.g. 500 ms and typed the tags on her/his own. In URL, title and description. User is supposed to input tag order to overcome this drawback another measure was into a(space separated) field. The output of an active ggested [4]. The measure takes into account the fact that recommender is labeled as recommendation user actually clicked the system's recommendations This measure is described as a fraction of matching tags This type of datastructure(containing assignments of which have been clicked. Our system outperforms other tags by users to resources) is usually named Folksonomy. recommender according to this measure Formally, a Folksonomy is a tuple F: =(U,T, R, Y), where U(users), T(tags) and R(resources) are finite sets. Y OUR APPROACH stands for a ternary relation between them Y sU xTxR. In this section we describe both our classification According to this model we can express certain properties algorithm and the technical details that are crucial for of folksonomies. The number of tag assignment of a given overall high speed of delivering recommendations user u is: I n (u)xTxR, the number of users which have A. Classification algorithm tagged resource r with tag t is: IY nUx(r)x(r. If we Our algorithm is a simplified version of the define a set of tags attached by user u to resource r by recommender described in 5). In the original version T={t∈r|(a,t,r)∈Y}, then a post is(a,T,r the classifier a complex DAG-type structure ensemble of simple classifiers, with a mechanism to expand list of tags based on tag co-occurance graph, was proposed. In IV. EVALUATION FRAMEWORK addition various set operations were used to combine lists Primary evaluation measure for tag recommender of tags at different levels of the ensemble engine. The systems are precision, recall and fl-measure. In case of author of this state of an art approach admits that the multilabel classification they are defined slightly different parameters of such complex system were not optimized than for classic one class or multi-category classification, Our idea was to remove time consuming elements of the For each post (u, T, r) we compare tags assigned to it by above mentioned system. This action enabled us delivering a user T with tags proposed by a recommender T(u, r) faster recommendations and also In case of the challenge only first five tags were compared experimenter environment to search for parameters that Additionally, the tags were normalized by Java method give recommendations comparable to the original classifier All the technical details of calculating evaluation measures The version of a system that is currently integrated with can be found in 3]. In order to compute precision and recall we need to count the tags that appear both in users the Bibsonomy bookmarking portal calculates four kinds final assignment and system's recommendation Secondly of conditional probabilities we divide this amount by the number of user's tags to P(tag=t resource =r), P(tag =tIuser=u) obtain recall(Eq. 1), and by the number of system's tags P(tag =fIt E URL ) and P(tag =f It e title to obtain precision(Eq. 2). These values are summed for The probabilities for a given tag are multiplied by a vector all posts and the fI-measure is obtained from those v=[0. 1 0.2 0.2 0.4] and five tags with highest aggregates(Eq. 3). Online evaluation is based on the same probabilities(not lower than 0.05)are sent to the measures. The only difference is that only multiplexing recommender recommendations returned within 1 000 milliseconds to the BibSonomy multiplexing engine are considered
the website of RecSys conference is labeled with one tag “rec”. BibSonomy portal enables to bookmark also publications, which can be described by several additional fields (e.g. volume, chapter, edition, institution, publisher, journal, pages, author). The total number of distinct user, URLs, BibTeXs and tags amounts to 3 617, 235 328, 143 050 and 93 756 respectively. Fig. 1 BibSonomy’s interface containing resource’s URL, title and description. User is supposed to input tags into a (space separated) field. The output of an active recommender is labeled as recommendation. This type of datastructure (containing assignments of tags by users to resources) is usually named Folksonomy. Formally, a Folksonomy is a tuple F := (U ,T , R,Y ) , where U (users), T (tags) and R (resources) are finite sets. Y stands for a ternary relation between them Y Í U ´ T ´ R . According to this model we can express certain properties of folksonomies. The number of tag assignment of a given user u is: Y Ç {u} ´ T ´ R , the number of users which have tagged resource r with tag t is: Y Ç U ´ {t} ´ {r} . If we define a set of tags attached by user u to resource r by T : {t T (| u, t,r) Y} ur = Î Î , then a post is (u,T ,r) ur . IV. EVALUATION FRAMEWORK Primary evaluation measure for tag recommender systems are precision, recall and f1-measure. In case of multilabel classification they are defined slightly different than for classic one class or multi-category classification. For each post (u,T ,r) ur we compare tags assigned to it by a user ur T with tags proposed by a recommender T (u , r) . In case of the challenge only first five tags were compared. Additionally, the tags were normalized by Java method. All the technical details of calculating evaluation measures can be found in [3]. In order to compute precision and recall we need to count the tags that appear both in user’s final assignment and system’s recommendation. Secondly we divide this amount by the number of user’s tags to obtain recall (Eq. 1), and by the number of system’s tags to obtain precision (Eq. 2). These values are summed for all posts and the f1-measure is obtained from those aggregates (Eq. 3). Online evaluation is based on the same measures. The only difference is that only recommendations returned within 1 000 milliseconds to the BibSonomy multiplexing engine are considered. ݎ݈݈ܽܿ݁൫ܶሺݑǡ ݎሻ൯ ൌ ሻȁݎ ǡݑሺ ܶת ݎݑܶȁ ȁܶݎݑ ȁ (1) ൌ ሻ൯ݎ ǡݑሺܶ൫݊݅ݏ݅ܿ݁ݎ ሻȁݎ ǡݑሺ ܶת ݎݑܶȁ ȁܶሺݑǡ ݎሻȁ (2) ݂ͳ݉ ൌ ݈݈ܽܿ݁ݎ ή݊ ݅ݏ݅ܿ݁ݎ ήʹ (3݈݈݁ܿܽ (ݎ ݊݅ݏ݅ܿ݁ݎ This constrain is motivated by the need to smoothly display recommendations to a user. The tags displayed to the user are selected randomly from one system that met the time constraint. If a recommender delivered tags within 1 second, but was not chosen for a display, its tags are also compared to the user’s final assignment. The architecture of the online framework is drawn in Fig. 2. The weakness of this evaluation approach is a fact that a recommender may meet the time constraint and correctly predict user’s selection of tags, but a user did not want to wait e.g. 500 ms and typed the tags on her/his own. In order to overcome this drawback another measure was suggested [4]. The measure takes into account the fact that a user actually clicked the system’s recommendations. This measure is described as a fraction of matching tags which have been clicked. Our system outperforms other recommender according to this measure. V. OUR APPROACH In this section we describe both our classification algorithm and the technical details that are crucial for an overall high speed of delivering recommendations. A. Classification algorithm Our algorithm is a simplified version of the recommender described in [5]. In the original version of the classifier a complex DAG-type structure ensemble of simple classifiers, with a mechanism to expand list of tags based on tag co-occurance graph, was proposed. In addition various set operations were used to combine lists of tags at different levels of the ensemble engine. The author of this state of an art approach admits that the parameters of such complex system were not optimized. Our idea was to remove time consuming elements of the above mentioned system. This action enabled us delivering faster recommendations and also setting up an experimenter environment to search for parameters that give recommendations comparable to the original classifier. The version of a system that is currently integrated with the BibSonomy bookmarking portal calculates four kinds of conditional probabilities: P(tag = t | resource = r) , P(tag = t | user = u) , P(tag = t | t ÎURL ) and P(tag = t | t Î title ) . The probabilities for a given tag are multiplied by a vector v = 1.0[ 2.0 2.0 ]4.0 and five tags with highest probabilities (not lower than 0.05) are sent to the multiplexing recommender. 108
Hs|2010 Rzeszow, Poland, May 13-15, 2010 equest Q Web Al Multiplexer Databas Fig. 2 The details of a bookmarked resource are sent in a background to the recommender systems and one suggestion returned within a time constraint is selected randomly for presentation to a user user has finally assigned four tags Our server is installed on a desktop with Windows xp (ww7, socialnetworking, tools, web20) B. Technical issues to er's profile operating system, an Intel Core Duo 3GHz processor and up to the date of the example is given in Table 2 3. 25 GB RAM. We used Apache Tomcat 5.5 as a servlet TABLE 2 THE PROFILE OF A USER IN THE EXAMPLE container, Java 1.6 update 13 virtual machine and MySQL 12.17 database. In order to achieve both services 1422 reliability and low latency we utilized a pattern design in which a Java class with static fields is loaded during Distinct tags used Tomcats startup. The class helps us to keep al library 20 19, lww7 14 web20 13, probabilities in a virtual memory. We created our own tools 9, blog 5, journal 4, article 4, collection class which contains both Java HashMap and Most common ocialnetworking 4, images 3 SortedSet fields with String and Double generic types to tags learningspaces 3, library 3, citation 2, optimize searching for keywords and storing sorted pairs informationliteracy 2, libraries 2, tag-probability> for efficient merging of base classifiers dinglist 2, education 2, wiki I We configured Tomcat to manage a pool of database connections. All processes related to saving feedback to a In the Table 3 we give probabilities of being an assigned database or text files are run in separate threads that do not tag for tokens from the resource's title and URL influence the core recommendation proce C. Example oken Appears in titles Is a tag Probabilit In this subsection we present details of recommending tags for a resource that was loaded into BibSonomy 3512 interface on 2007-05-17 16: 16: 00 by a user 1422. The 57 0.14 content of a resource is given in Table I 10187 1394 0.11 TABLE I BOOKMARK CONSIDERED IN THE EXAMPLE 345 eature 771 4 0.00 86befb63c44b0aef6e8d196290cfal 5814 http://www.ning.com/ social web apps appears in URLs Ning is the easiest way to create your 0.17 Extended title own social web apps on the Intemet day And it's free! 106640 Statistics up to the date of the post 14855 Distinct users Distinct tags According to the recommender's logic described in web20 11, social 10, ajax 6, susbsection A, the system will combine probabilities from Most common socialsoftware 5, network 4, the above tables and recommend five tags: web20 gs folksonomy 3, flickr 3, tool 3, social, ajax, socialsoftware. This prediction yields precision, recall and fl-measure at the level of0, 2
+6, 5]HV]RZ3RODQG0D\ Fig. 2 The details of a bookmarked resource are sent in a background to the recommender systems and one suggestion returned within a time constraint is selected randomly for presentation to a user. B. Technical issues Our server is installed on a desktop with Windows XP operating system, an Intel Core Duo 3GHz processor and 3,25 GB RAM. We used Apache Tomcat 5.5 as a servlet container, Java 1.6 update 13 virtual machine and MySQL 1.2.17 database. In order to achieve both service’s reliability and low latency we utilized a pattern design in which a Java class with static fields is loaded during Tomcat’s startup. The class helps us to keep all probabilities in a virtual memory. We created our own collection class which contains both Java HashMap and SortedSet fields with String and Double generic types to optimize searching for keywords and storing sorted pairs for efficient merging of base classifiers. We configured Tomcat to manage a pool of database connections. All processes related to saving feedback to a database or text files are run in separate threads that do not influence the core recommendation process. C. Example In this subsection we present details of recommending tags for a resource that was loaded into BibSonomy interface on 2007-05-17 16:16:00 by a user 1422. The content of a resource is given in Table 1. TABLE 1 BOOKMARK CONSIDERED IN THE EXAMPLE Feature Value Resource identifier 86befb63c44b0aef6e8d196290cfa1 URL http://www.ning.com/ Title Ning - social web apps Extended title Ning is the easiest way to create your own social web apps on the Internet today. And it's free! Statistics up to the date of the post Distinct users 18 Distinct tags 74 Most common tags web20 11, social 10, ajax 6, socialsoftware 5, network 4, folksonomy 3, flickr 3, tool 3, programming 3, free 3, tags 3 The user has finally assigned four tags to the resource (lww7, socialnetworking, tools, web20). The user’s profile up to the date of the example is given in Table 2. TABLE 2 THE PROFILE OF A USER IN THE EXAMPLE Feature Value User’s id 1422 Resources Tagged 43 Distinct tags used 29 Most common tags library20 19, lww7 14 ,web20 13, tools 9, blog 5, journal 4, article 4, socialnetworking 4, images 3, learningspaces 3, library 3, citation 2, informationliteracy 2, libraries 2, readinglist 2, education 2, wiki 1 In the Table 3 we give probabilities of being an assigned tag for tokens from the resource’s title and URL. TABLE 3 SIGNIFICANCE OF TOKENS IN RESOURCE’S CONTENT Token Appears in titles Is a tag Probability social 3 341 777 0.23 internet 3 512 661 0.18 ning 57 8 0.14 web 10 187 1 394 0.13 free 6 002 679 0.11 apps 345 34 0.09 create 1 177 16 0.01 own 771 4 0.00 your 5 814 8 0.00 the 31 601 34 0.00 and 25 341 26 0.00 Appears in URLs ning 69 12 0.17 http 173 221 167 0.00 www 106 640 93 0.00 com 74 855 7 0.00 According to the recommender’s logic described in susbsection A, the system will combine probabilities from the above tables and recommend five tags: web20, ning, social, ajax, socialsoftware. This prediction yields equal precision, recall and f1-measure at the level of 0,2. 109
Hs|2010 Rzeszow, Poland, May 13-15, 2010 2000 1600 1000 356789012345 timeout percentage of selected posts 2009-09-1411:17:06 Fig 3 Latency per selected query achieved during the Challenge Our system is labeled with 13. The figure is drawn fromaprintscreenofaremoteevaluationframeworkswebpagehttp://www.dev.bibsonomy.orgeval 1800 1600 00 00 800 200 8 percentage of selected posts 2009-12-1917:24:56 Fig 4 Latency per selected query, statistics contain data up to 19 December 2009. Our system is labeled with 13. The figureisdrawnfromaprintscreenofaremoteevaluationframeworks'swebpagehttp://www.dev.bibsonomy.orgeval
+6, 5]HV]RZ3RODQG0D\ Fig. 3 Latency per selected query achieved during the Challenge. Our system is labeled with 13. The figure is drawn from a print-screen of a remote evaluation frameworks’s webpage http://www.dev.bibsonomy.org/eval. Fig. 4 Latency per selected query, statistics contain data up to 19 December 2009. Our system is labeled with 13. The figure is drawn from a print-screen of a remote evaluation frameworks’s webpage http://www.dev.bibsonomy.org/eval. 110
t0.40 number of tags 2009-12-1917:24:55 Fig. 5 The fraction of matching tags, which have been clicked. Our system is labeled with 13. The figure is drawn fromaprintscreenofaremoteevaluationframeworks'swebpagehttp://www.dev.bibsonomy.orgeval 0 0,00 8 2009-12-1917:24:55 Fig 6 Values of precision and recall measures calculated for top five tags. Our system is labeled with 13. The figure is awnfromaprintscreenofaremoteevaluationframeworkswebpagehttp://www.dev.bibsonomy.orgeval
Fig. 5 The fraction of matching tags, which have been clicked. Our system is labeled with 13. The figure is drawn from a print-screen of a remote evaluation frameworks’s webpage http://www.dev.bibsonomy.org/eval. Fig. 6 Values of precision and recall measures calculated for top five tags. Our system is labeled with 13. The figure is drawn from a print-screen of a remote evaluation frameworks’s webpage http://www.dev.bibsonomy.org/eval. 111
10 0.25 0.05 0.00 8 8 2009-12-2011:25:06 Fig. 7 Values of precision and recall measures calculated for top five tags (ignoring timeouts). The figure is drawn romaprintscreenofaremoteevaluationframeworkswebpagehttp://www.dev.bibsonomy.org/eval.Oursystemis labeled with 13 Task Three of ECML PKDD Discovery Challenge and currently outperforms other systems. It results in high rate It is hard to calculate precisely how every single of matching tags that have been clicked by a user and optimization step influences the overall performance of enables us to continue the research on gradually making ur recommender. During the five weeks period of the recommender system more complex, insignificantly Challenge evaluation eleven out of thirteen teams(Fig 3) slower, but more accurate were exceeding the 1 000 ms timeout. Figure 4 contains e systems that stayed in the evaluation at least until REFERENCES ons ob the e cmlpkDD December 2009 and on this figure our system outperforms [u Chojnacki s(2009)Tag Recommendations based other in terms of response's latency (90% of recommendations are delivered between 200 and 400 Discovery Challenge Workshop, Bled, Slovenia. milliseconds) [2] Eisterlehner F, Hotho A, Jaschke R (2009) ECML-PKDD Discovery Challenge 2009, CEUR-WS.org Our system outputs not only fast predictions but also [3] Jaschke R, et al (2008)Tag Recommendations in Social accurate. At the end of december 2009 we were at the Pp.231-247 second position according to precision and recall [4] Jaschke R, et al(2009)Testing and Evaluating Tag Recommenders calculated for first one to five tags returned to the in a Live System. In: Workshop on Knowledge Discovery, Da multiplexing engine within time constraint(Fig. 6). If the Mining, and Machine Leaming, pp 44-51 timeouts are omitted(Fig. 7)our system is at the third [5] Lipczak M, et al(2009) Tag sources for recommendation in position after algorithm number 14 Discovery Challenge Workshop, Bled, Slovenia. The third evaluation criterion is a proportion o atching tags that have been clicked by a user. Our system(Fig. 5)is the only one which exceeds a threshold of 40%. All the mentioned statistics are available online at dev bibsonomy. org/eval VI CoNCLUSION In this article we described several methods to improve e performance of a recommender system in a real life setting. The latency in an interval of 200 and 400 lliseconds constitutes a significant improvement in comparison to the best results achieved during Online
Fig. 7 Values of precision and recall measures calculated for top five tags (ignoring timeouts). The figure is drawn from a print-screen of a remote evaluation frameworks’s webpage http://www.dev.bibsonomy.org/eval. Our system is labeled with 13. VI. EVALUATION It is hard to calculate precisely how every single optimization step influences the overall performance of our recommender. During the five weeks period of Challenge evaluation eleven out of thirteen teams (Fig. 3) were exceeding the 1 000 ms timeout. Figure 4 contains the systems that stayed in the evaluation at least until December 2009 and on this figure our system outperforms other in terms of response’s latency (90% of recommendations are delivered between 200 and 400 milliseconds). Our system outputs not only fast predictions but also accurate. At the end of December 2009 we were at the second position according to precision and recall calculated for first one to five tags returned to the multiplexing engine within time constraint (Fig. 6). If the timeouts are omitted (Fig. 7) our system is at the third position after algorithm number 14. The third evaluation criterion is a proportion of matching tags that have been clicked by a user. Our system (Fig. 5) is the only one which exceeds a threshold of 40%. All the mentioned statistics are available online at dev.bibsonomy.org/eval. VII CONCLUSION In this article we described several methods to improve the performance of a recommender system in a real life setting. The latency in an interval of 200 and 400 milliseconds constitutes a significant improvement in comparison to the best results achieved during Online Task Three of ECML PKDD Discovery Challenge and currently outperforms other systems. It results in high rate of matching tags that have been clicked by a user and enables us to continue the research on gradually making the recommender system more complex, insignificantly slower, but more accurate. REFERENCES [1] Chojnacki S (2009) Tag Recommendations based on Tracking Social Bookmarking Services. In: Proceedings of the ECML-PKDD Discovery Challenge Workshop, Bled, Slovenia. [2] Eisterlehner F, Hotho A, Jäschke R (2009) ECML-PKDD Discovery Challenge 2009, CEUR-WS.org. [3] Jäschke R, et al (2008) Tag Recommendations in Social Bookmarking Systems, In: AI Communications, Vol. 21, Nr. 4, Amsterdam: IOS Press, pp. 231-247. [4] Jäschke R, et al (2009) Testing and Evaluating Tag Recommenders in a Live System. In: Workshop on Knowledge Discovery, Data Mining, and Machine Learning, pp. 44-51. [5] Lipczak M, et al (2009) Tag sources for recommendation in collaborative tagging systems. In: Proceedings of the ECML-PKDD Discovery Challenge Workshop, Bled, Slovenia. 112