Proceedings of 2010 Workshop on Social recommender systems In conjunction with ACM 2010 International Conference on Intelligent User Interfaces(IUr2010) February 7, 2010 Hong Kong. China Workshop Chairs: Ido guy, Li Chen, Michelle x. Zhou ACM International Conference Proceedings serie ACM ISBN:978-1-60558-995-4
Proceedings of 2010 Workshop on Social Recommender Systems In conjunction with ACM 2010 International Conference on Intelligent User Interfaces (IUI’2010) February 7, 2010 Hong Kong, China Workshop Chairs:Ido Guy, Li Chen, Michelle X. Zhou ACM International Conference Proceedings Series ACM Press ACM ISBN: 978-1-60558-995-4
The Association for Computing Machinery 2 Penn Plaza Suite 701 New York New York 10121-0701 ACM COPYRIGHT NOTICE Copyright c 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM, Inc, fax +1(212) 869-0481, or permissions @acm or For other copying of articles that carry a code at the bottom of the first or last page, copying is mitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center. 222 Rosewood Drive. Danvers. MA 01923. +1-978-750-8400 +1-978-750-4470( Notice to Past Authors of ACM-Published article ACM intends to create a complete electronic archive of all articles and/or other material previously published by ACM. If you have written a work that was previously published by any Jor ime, and you do not want this work to appear in the ACM Digital Library, please inform g, stating the title of the work, the author(s), and where and wl ACM ISBN:978-1-60558-995-4
The Association for Computing Machinery 2 Penn Plaza, Suite 701 New York New York 10121-0701 ACM COPYRIGHT NOTICE. Copyright © 2010 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org. For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-750-8400, +1-978-750-4470 (fax). Notice to Past Authors of ACM-Published Article ACM intends to create a complete electronic archive of all articles and/or other material previously published by ACM. If you have written a work that was previously published by ACM in any journal or conference proceedings prior to 1978, or any SIG Newsletter at any time, and you do NOT want this work to appear in the ACM Digital Library, please inform permissions@acm.org, stating the title of the work, the author(s), and where and when published. ACM ISBN: 978-1-60558-995-4
Workshop chairs Ido Guy. IBM Haifa Research Lab. israel Li Chen, Hong Kong Baptist University, Hong Kong Michelle X. Zhou. IBM China Research Lab China Program Committee Mathias Bauer, Mineway, German Shlomo Berkovsky. CSIRO. Australia Peter Brusilovsky, University of Pittsburgh, USA Robin Burke, DePaul University, USA David Carmel. IBM Research. Israel Alexander Felfernig, Graz University of Technology, Austria Werner Geyer, IBM Research, USA Alfred Kobsa, University of California, USA Yehuda Koren, Yahoo, Israel University of Minnesota, USA Bracha Shapira, Ben-Gurion University of the Negev, Israel
Workshop Chairs Ido Guy, IBM Haifa Research Lab, Israel Li Chen, Hong Kong Baptist University, Hong Kong Michelle X. Zhou, IBM China Research Lab, China Program Committee Mathias Bauer, Mineway, Germany Shlomo Berkovsky, CSIRO, Australia Peter Brusilovsky, University of Pittsburgh, USA Robin Burke, DePaul University, USA David Carmel, IBM Research, Israel Alexander Felfernig, Graz University of Technology, Austria Werner Geyer, IBM Research, USA Alfred Kobsa, University of California, USA Yehuda Koren, Yahoo, Israel John Riedl, University of Minnesota, USA Bracha Shapira, Ben-Gurion University of the Negev, Israel
Foreword Social media sites have become tremendously popular in recent years. These sites include photo and video sharing sites such as Flickr and You Tube, blog and wiki systems such as Blogger and wikipedia, social tagging sites such as Delicious, social network sites(SNSs), such as My Space and Facebook, and micro-blogging sites such as Twitter. Millions of users are active daily in these sites, creating rich information online that has not been available before yet, the abundance and ty of social media sites floods users with huge volumes of information and hence poses a great challenge in terms of information overload. In addition, most of user-generated contents are unstructured (e. g, blogs and wikis).It hence raises open questions of how such information can be exploited for personalization. Social Recommender Systems(SRSs) aim to alleviate information overload over social media users by presenting the most attractive and relevant content, often using personalization techniques adapted for SRSS also aim at increasing adoption, engagement, and participation of new and existing users of social media sites. Traditional techniques, such as content-based methods and collaborative filtering are being used separately and jointly to support effective recommendations. Yet, the social media platform allows incorporating new techniques that take advantage of the new information becoming publicly available in social media sites, such as the explicit connections between individuals in SNS, the tags people are using to classify items, and the content they create. In addition to recommending content to consume, new types of recommendations emerge within social media, such as of people and communities to connect to, to follow, or to join. The fact that much of the information within social media site connections, content, and sometimes even message correspondence is public, enabling more transparency in social recommender systems. New techniques for explanations that try to reason a recommendation provided to a user are being exploited, aiming at increasing users trust in the system nd stimulating more active participation. On the other hand, incorporating user feedback - both plicit and implie mprove recommendations and keep them attractive over time is another portant challenge for SrSs Indeed, explaining the rationale behind recommendations as well as presenting recommendation results is an important aspect of social recommender systems. Because of the diverse information used in making recommendation(e.g, social network as well as content relevance), effective mechanisms must be in place to explain the recommendation rationale and results to users. Not only will such an explanation help instill users'trust in recommended items, but it also provides an opportunity for users used for making recommendations). In addition to providing recommendations to individuals recommender systems are also often targeted for communities. Community recommendations take into account the entire set of community members, the aggregation of their diverse needs for constructing community preference models, the analysis of their collective behavior, and the different
Foreword Social media sites have become tremendously popular in recent years. These sites include photo and video sharing sites such as Flickr and YouTube, blog and wiki systems such as Blogger and Wikipedia, social tagging sites such as Delicious, social network sites (SNSs), such as MySpace and Facebook, and micro-blogging sites such as Twitter. Millions of users are active daily in these sites, creating rich information online that has not been available before. Yet, the abundance and popularity of social media sites floods users with huge volumes of information and hence poses a great challenge in terms of information overload. In addition, most of user-generated contents are unstructured (e.g., blogs and wikis). It hence raises open questions of how such information can be exploited for personalization. Social Recommender Systems (SRSs) aim to alleviate information overload over social media users by presenting the most attractive and relevant content, often using personalization techniques adapted for the specific user. SRSs also aim at increasing adoption, engagement, and participation of new and existing users of social media sites. Traditional techniques, such as content-based methods and collaborative filtering are being used separately and jointly to support effective recommendations. Yet, the social media platform allows incorporating new techniques that take advantage of the new information becoming publicly available in social media sites, such as the explicit connections between individuals in SNSs, the tags people are using to classify items, and the content they create. In addition to recommending content to consume, new types of recommendations emerge within social media, such as of people and communities to connect to, to follow, or to join. The fact that much of the information within social media sites – tags, comments, ratings, connections, content, and sometimes even message correspondence – is public, enabling more transparency in social recommender systems. New techniques for explanations that try to reason a recommendation provided to a user are being exploited, aiming at increasing users’ trust in the system and stimulating more active participation. On the other hand, incorporating user feedback – both explicit and implicit – to improve recommendations and keep them attractive over time is another important challenge for SRSs. Indeed, explaining the rationale behind recommendations as well as presenting recommendation results is an important aspect of social recommender systems. Because of the diverse information used in making recommendation (e.g., social network as well as content relevance), effective mechanisms must be in place to explain the recommendation rationale and results to users. Not only will such an explanation help instill users’ trust in recommended items, but it also provides an opportunity for users to provide feedback for adaptive recommendations (e.g., deleting unwanted information sources to be used for making recommendations). In addition to providing recommendations to individuals, social recommender systems are also often targeted for communities. Community recommendations need to take into account the entire set of community members, the aggregation of their diverse needs for constructing community preference models, the analysis of their collective behavior, and the different
already consumed by Another main challenge in the area of Recommender Systems is the evaluation of provided recommendations. Social media presents opportunities for new evaluation techniques, for example by leveraging tags as interest indicators of specific topics or specific items, or by harnessing the crowds hat are actively participating in the sites. Developing new evaluation techniques and applying them social recommender systems are essential to compare different recommendation methods and reach more effective systems This workshop brought together researchers and practitioners around the emerging topic of recommender systems within social media in order to: (1)share research and techniques used to develop effective social media recommenders, from algorithms, through user interfaces, to evaluation (2)identify next key challenges in the area, and(3)identify new cross-topic collaboration opportunities The Workshop Organizing Committee 20l
content already consumed by the community. Another main challenge in the area of Recommender Systems is the evaluation of provided recommendations. Social media presents opportunities for new evaluation techniques, for example by leveraging tags as interest indicators of specific topics or specific items, or by harnessing the crowds that are actively participating in the sites. Developing new evaluation techniques and applying them on social recommender systems are essential to compare different recommendation methods and reach more effective systems. This workshop brought together researchers and practitioners around the emerging topic of recommender systems within social media in order to: (1) share research and techniques used to develop effective social media recommenders, from algorithms, through user interfaces, to evaluation (2) identify next key challenges in the area, and (3) identify new cross-topic collaboration opportunities. The Workshop Organizing Committee February 2010
Table of contents 1. Information Seeking with Social Signals: Anatomy of a Social Tag-based Exploratory Search Browser. Ed H Chi and Rowan nairn 2. Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method, Zhe 3. Ethical Intelligence in Social Recommender Systems, Peter Juel Henrichsen 4. How Social Relationships Affect User Similarities, Alan Said, Ernesto W. De Luca, Sahin 5. User Evaluation Framework of Recommender Systems, Li Chen, Pearl Pu 6. A Step toward Personalized Social Geotagging, A. Di Napoli, F. Gasparetti, A. Micarelli, G 8. Pharos: Social Map-Based Recommendation for Content-Centric Social Websites, Wentao Zheng, Michelle Zhou, Shiwan Zhao, Quan Yuan, Xiatian Zhang, Changyan Chi 9. Conference Navigator 2.0: Community-Based Recommendation for Academic Conferences Chirayu Wongchokprasitti, Peter Brusilovsky, Denis Parra
Table of Contents 1. Information Seeking with Social Signals: Anatomy of a Social Tag-based Exploratory Search Browser, Ed H. Chi and Rowan Nairn 2. Tags Meet Ratings: Improving Collaborative Filtering with Tag-Based Neighborhood Method, Zhe Wang, Yongji Wang, Hu Wu 3. Ethical Intelligence in Social Recommender Systems, Peter Juel Henrichsen 4. How Social Relationships Affect User Similarities, Alan Said, Ernesto W. De Luca, Sahin Albayrak 5. User Evaluation Framework of Recommender Systems, Li Chen, Pearl Pu 6. A Step toward Personalized Social Geotagging, A. Di Napoli, F. Gasparetti, A. Micarelli, G. Sansonetti 7. Relationship Aggregation for Enhancing Enterprise Recommendations, Ido Guy 8. Pharos: Social Map-Based Recommendation for Content-Centric Social Websites, Wentao Zheng, Michelle Zhou, Shiwan Zhao, Quan Yuan, Xiatian Zhang, Changyan Chi 9. Conference Navigator 2.0: Community-Based Recommendation for Academic Conferences, Chirayu Wongchokprasitti, Peter Brusilovsky, Denis Parra
Information Seeking with Social Signals: Anatomy of a Social Tag-based Exploratory Search Browser Ed h. chi. Rowan nairn Palo alto research Center 3333 Coyote Hill Road, Palo Alto, CA94304 USA techirnairn@parc.com goals, with persistent, opportunistic, iterative, multi-faceted Whereas for the fact-retrieval searches, optimal paths to the processes aimed more at learning than answering a specific documents containing the required information are crucial query [18, 23] arning and investigation activities lead to One existing solution to exploratory search problems is the continuous and exploratory process with the knowledge use of intelligent clustering algorithms that groups and acquired during this journey" being essential as well. organizes search results into broad categories for easy Therefore, information seeking systems should focus on browsing. Clusty from Vivisimo (clusty. com) is one providing cues that might make these explorations more relatively successful example of these kinds of systems that efficient. One possible solution is in building information grew out of Al research at Carnegie Mellon University seeking systems in which navigation signposts are provided One well-known example is Scatter/Gather, which used fast by social cues provided by a large number of other people. clustering algorithm to provide a browsing interface to very One possible source for social cues is all of the social large document collections [6]. These efforts can be seen bookmarks on social tagging sites. Social tagging arose out continuing a long line of search system research on user of the need to organize found information that is worth relevance feedback, which is a set of techniques for users to revisiting. The collective behavior of users who tagged have an interactive dialog with the search system, often to contents seems to offer a good basis for exploratory search explore a topic space [2, 21]. These clustering-based interfaces. even for users who are not u soCla browsing systems extract patterns in the content to provide ookmarking sites. In this paper, we present the algorithm grouping structures and cues for users to follow in their of a tag-based exploratory system based on this idea. exploratory searches, often narrowing down on more Author Keywords specific topic areas or jumping between related sub-topics Social Tagging, Exploratory Interfaces, Social Search Several researchers have suggested the possibility of ggregating social cues from social bookmarks to provide ACM Classification Keywords cues in social search systems [13]. We wish to seriously H3.3 [Information Search and Retrieval]: Relevance Feedback, Search Process, Selection Process; H5.2 explore the use of social cues to provide navigational aids to exploratory users. The problem with freeform social [Information interfaces and presentation: User Interfaces agging systems is that, as the tagging systems evolve over time, their information signal declines and noise increases, tasks that involve finding a specific answer to a specific morphologies(s, misspe l lings, and other linguistic INTRODUCTION Existing search engines on the Web are often best on search due to synonym question. However, users of web search engines often need We designed and implemented a tag-based exploratory to explore new topic areas, specifically looking for general search system called MrTaggy. com, which is constructed overage of a topic area to provide an overview. As part of with social tagging data, crawled from social bookmarking information seeking, these kinds of exploratory searches sites on the web. Based on the TagSearch algorithm involves ill-structured problems and more open-ended MrTaggy performs tag normalizations that reduces the noise and finds the patterns of co-occurrence between tags to offer recommendations of related tags and contents [151 We surmised ed tags help deal with th Permission to make digital or hard copies of all or his work for vocabulary problem during search [10]. The hope is that not made or distributed for profit or commercial advan these recommendations of tags and destination pages offer bear this notice and the full citation on the first page republish, to post on servers or to redistribute to lists, requires prior In a recent paper [15], we described a user experiment in Workshop SRS'10, February 7, 2010 Hong Kong, China studied the learning effects of subjects using our Copyright2010ACM978-1-60558-995-4.s10.0 tag search browser as compared against a baseline system
1 Information Seeking with Social Signals: Anatomy of a Social Tag-based Exploratory Search Browser Ed H. Chi, Rowan Nairn Palo Alto Research Center 3333 Coyote Hill Road, Palo Alto, CA 94304 USA {echi,rnairn}@parc.com ABSTRACT Whereas for the fact-retrieval searches, optimal paths to the documents containing the required information are crucial, learning and investigation activities lead to a more continuous and exploratory process with the knowledge acquired during this “journey” being essential as well. Therefore, information seeking systems should focus on providing cues that might make these explorations more efficient. One possible solution is in building information seeking systems in which navigation signposts are provided by social cues provided by a large number of other people. One possible source for social cues is all of the social bookmarks on social tagging sites. Social tagging arose out of the need to organize found information that is worth revisiting. The collective behavior of users who tagged contents seems to offer a good basis for exploratory search interfaces, even for users who are not using social bookmarking sites. In this paper, we present the algorithm of a tag-based exploratory system based on this idea. Author Keywords Social Tagging, Exploratory Interfaces, Social Search ACM Classification Keywords H3.3 [Information Search and Retrieval]: Relevance Feedback, Search Process, Selection Process; H5.2. [Information interfaces and presentation]: User Interfaces INTRODUCTION Existing search engines on the Web are often best on search tasks that involve finding a specific answer to a specific question. However, users of web search engines often need to explore new topic areas, specifically looking for general coverage of a topic area to provide an overview. As part of information seeking, these kinds of exploratory searches involves ill-structured problems and more open-ended goals, with persistent, opportunistic, iterative, multi-faceted processes aimed more at learning than answering a specific query [18, 23]. One existing solution to exploratory search problems is the use of intelligent clustering algorithms that groups and organizes search results into broad categories for easy browsing. Clusty from Vivisimo (clusty.com) is one relatively successful example of these kinds of systems that grew out of AI research at Carnegie Mellon University. One well-known example is Scatter/Gather, which used fast clustering algorithm to provide a browsing interface to very large document collections [6]. These efforts can be seen as continuing a long line of search system research on user relevance feedback, which is a set of techniques for users to have an interactive dialog with the search system, often to explore a topic space [2, 21]. These clustering-based browsing systems extract patterns in the content to provide grouping structures and cues for users to follow in their exploratory searches, often narrowing down on more specific topic areas or jumping between related sub-topics. Several researchers have suggested the possibility of aggregating social cues from social bookmarks to provide cues in social search systems [13]. We wish to seriously explore the use of social cues to provide navigational aids to exploratory users. The problem with freeform social tagging systems is that, as the tagging systems evolve over time, their information signal declines and noise increases, due to synonyms, misspellings, and other linguistic morphologies [4]. We designed and implemented a tag-based exploratory search system called MrTaggy.com, which is constructed with social tagging data, crawled from social bookmarking sites on the web. Based on the TagSearch algorithm, MrTaggy performs tag normalizations that reduces the noise and finds the patterns of co-occurrence between tags to offer recommendations of related tags and contents [15]. We surmised that the related tags help deal with the vocabulary problem during search [10]. The hope is that these recommendations of tags and destination pages offer support to the user while exploring an unfamiliar topic area. In a recent paper [15], we described a user experiment in which we studied the learning effects of subjects using our tag search browser as compared against a baseline system. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Workshop SRS'10, February 7, 2010 Hong Kong, China Copyright 2010 ACM 978-1-60558-995-4... $10.0
We found that MrTaggy's full exploratory features provide Many researchers in the information retrieval community to the users a kind of scaffolding support for learning topic have already explored the use of query logs for aiding later searchers [20, 8, 11]. Direct Hit was one early example knowledge in the topic area from early 2001 that used click data on search results to However, due to space limitations, the previous paper did nking. The click data was not present any details on the implementation and algorithm implicitly through the search engine usage log design of the system. In this paper, we detail the design and Others like Google's SearchWiki are allowing users to implementation of Mrtaggy.coM explicitly vote for search results to directly influence the First, we briefly give an overview of the overall user interface and system. Then we focus specifically on a more popular recently. Interestingly, Google's original system, as well as the MapReduce architecture needed to mplicit voting system by essentially treating a hyperlink as model and process 140 million bookmarks using a a vote for the linked content h model. we Popularity data derived from other social cues could also be overview of the user study reported previously, and finally used in ranking search results. Several researchers in offer some concluding remarks CSCW have noted how bookmarks and tags serve as signals to others in the community. For example, Lee RELATED WORK found that analyses of del icio us users who perceive greater Much speculation in the Web and the search degrees of social presence are more likely to annotate their research community has focused on the promise of bookmarks to facilitate sharing and discovery [17] A well Search". Researchers and practitioners now use the known study by Golder and Huberman showed that there is "social search"to describe search systems in which social remarkable regularity in the structure of the social tagging interactions or information from social sources are engaged systems that is suggestive of a productive peer-to-peer in some way [7]. Current social search systems can be knowledge system 12] categorized into two general classes In both classes of social search systems, there are still many (1) Social answering systems utilize people with expertise opportunities to apply sophisticated statistical and structure or opinions to answer particular questions in a domain. based analytics to improve search experience for social Answerers could come from various levels of social searchers. For example, expertise-finding algorithms could proximity, including close friends and coworkers as well as be applied to help find answerers who can provide higher egreaterpublicYahoo!anSwers(answersyahoo.com)qualityanswerstoparticularquestionsinsocialanswering is one example of such systems. Early academic research systems. Common patterns between question-and-answer includes Ackerman,s Answer Garden [1], and recent pairs could be exploited to construct semantic relationships, startups include Aardvark (vark. com) and Quora which may be used to draw automatic inferences to new (quora. com) ining algorithms could construct ontologies that Some systems utilize social networks to find friends or bookmarked documents ful for browsing through the tags and friends of friends to provide answers. Web users also use discussion forums, IM chat systems, or their favorite social SOCIAL BOOKMARKS AS NAVIGATION SIGNPOSTS networking systems like Facebook and Twitter to ask their Recently there has been an efflorescence of systems aimed social network for answers that are hard to find using at supporting social information foraging and sensemaking traditional keyword-based systems. These systems differ in These include social tagging and bookmarking systems for terms of their immediacy, size of the network, as well as support for expert finding photos (e.g, flickr. com), videos (e.g, youtube. com),or Web pages(e.g, del. icio. us). A unique aspect of tagging Importantly, the effectiveness of these systems depends on systems is the freedom that users have in choosing the the efficiency in which they utilize search and vocabulary used to tag objects: any free-form keyword is ecommendation algorithms to return the most relevant past allowed as a tag answers, allowing for better constructions of the knowledge Tagging systems provide ways for users to generate labeled links to content that. at a later time. can be browsed and 2)Social feedback systems utilize social attention data to searched Social bookmarking systems such as del icio. us rank search results or information items. Feedback from already allow users to search the entire database fo users could be obtained either implicitly or explicitly. For websites that match particular popular tags. Tags can be example, social attention data could come from usage logs mplicitly, or systems could explicitly ask users for votes, tags, and bookmarks http://www.searcl review. htm
2 We found that MrTaggy’s full exploratory features provide to the users a kind of scaffolding support for learning topic domains, particularly compensating for the lack of prior knowledge in the topic area. However, due to space limitations, the previous paper did not present any details on the implementation and algorithm design of the system. In this paper, we detail the design and implementation of MrTaggy.com. First, we briefly give an overview of the overall user interface and system. Then we focus specifically on a deeper discussion of the design choices we made in the system, as well as the MapReduce architecture needed to model and process 140 million bookmarks using a probabilistic graph model. We then provide a quick overview of the user study reported previously, and finally offer some concluding remarks. RELATED WORK Much speculation in the Web and the search engine research community has focused on the promise of “Social Search”. Researchers and practitioners now use the term “social search” to describe search systems in which social interactions or information from social sources are engaged in some way [7]. Current social search systems can be categorized into two general classes: (1) Social answering systems utilize people with expertise or opinions to answer particular questions in a domain. Answerers could come from various levels of social proximity, including close friends and coworkers as well as the greater public. Yahoo! Answers (answers.yahoo.com) is one example of such systems. Early academic research includes Ackerman’s Answer Garden [1], and recent startups include Aardvark (vark.com) and Quora (quora.com). Some systems utilize social networks to find friends or friends of friends to provide answers. Web users also use discussion forums, IM chat systems, or their favorite social networking systems like Facebook and Twitter to ask their social network for answers that are hard to find using traditional keyword-based systems. These systems differ in terms of their immediacy, size of the network, as well as support for expert finding. Importantly, the effectiveness of these systems depends on the efficiency in which they utilize search and recommendation algorithms to return the most relevant past answers, allowing for better constructions of the knowledge base. (2) Social feedback systems utilize social attention data to rank search results or information items. Feedback from users could be obtained either implicitly or explicitly. For example, social attention data could come from usage logs implicitly, or systems could explicitly ask users for votes, tags, and bookmarks. Many researchers in the information retrieval community have already explored the use of query logs for aiding later searchers [20, 8, 11]. Direct Hit1 was one early example from early 2001 that used click data on search results to inform search ranking. The click data was gathered implicitly through the search engine usage log. Others like Google’s SearchWiki are allowing users to explicitly vote for search results to directly influence the search rankings. Indeed, vote-based systems are becoming more popular recently. Interestingly, Google’s original ranking algorithm PageRank could also be classified as an implicit voting system by essentially treating a hyperlink as a vote for the linked content. Popularity data derived from other social cues could also be used in ranking search results. Several researchers in CSCW have noted how bookmarks and tags serve as signals to others in the community. For example, Lee found that analyses of del.icio.us users who perceive greater degrees of social presence are more likely to annotate their bookmarks to facilitate sharing and discovery [17]. A wellknown study by Golder and Huberman showed that there is remarkable regularity in the structure of the social tagging systems that is suggestive of a productive peer-to-peer knowledge system [12]. In both classes of social search systems, there are still many opportunities to apply sophisticated statistical and structurebased analytics to improve search experience for social searchers. For example, expertise-finding algorithms could be applied to help find answerers who can provide higherquality answers to particular questions in social answering systems. Common patterns between question-and-answer pairs could be exploited to construct semantic relationships, which may be used to draw automatic inferences to new questions. Data mining algorithms could construct ontologies that are useful for browsing through the tags and bookmarked documents. SOCIAL BOOKMARKS AS NAVIGATION SIGNPOSTS Recently there has been an efflorescence of systems aimed at supporting social information foraging and sensemaking. These include social tagging and bookmarking systems for photos (e.g., flickr.com), videos (e.g., youtube.com), or Web pages (e.g., del.icio.us). A unique aspect of tagging systems is the freedom that users have in choosing the vocabulary used to tag objects: any free-form keyword is allowed as a tag. Tagging systems provide ways for users to generate labeled links to content that, at a later time, can be browsed and searched. Social bookmarking systems such as del.icio.us already allow users to search the entire database for websites that match particular popular tags. Tags can be 1 http://www.searchengineshowdown.com/features/directhit/ review.html
organized to provide meaningful navigation structures, and, Of course, for a URL, if we want to suggest othe consequently, can be viewed as an external representation we also want to form a similar semantic similari of what the users learned from a page and of how they which is mediated by the tags. There are also chose to organize that knowledge where, given a tag, we want to suggest URLS, and vice Using social tagging data as"navigational advice"and versa suggestions for additional vocabulary terms, we are In our approach, the idea is to first form a bigraph between interested in designing exploratory search systems that document and tagging pairs. Each tagging data in a tuple could help novice users gain knowledge in a topic area specifies a linking relationship between a tag and a document object. For each URL, we want to know the However, one problem is that the social cues given by probability of a particular tag being relevant to that URL people are inherently noisy. Social tagging generates vast nd vice versa. For a URL, the probability pTagIURL) can be roughly estimated by the number of times a particular amounts of noise in various forms, including synonyms, tag is applied by users divided by total number of times all isspellings, and other linguistic morphologies, as well as deliberate spam [4]. In past research, we showed that tags are used for a URL. Figure 2 depicts this bi-graph extracting patterns within such data becomes more and more difficult as the data size grows [4]. This research Tags URLS shows that an information theoretic analysis of tag usage in del icio us bookmarks suggest of decreased efficiency in using tags as navigational aids [4] To combat noisy patterns in tags, we have designed P(URLITag system using probabilistic networks to model relationshil between tags, which are treated as topic keywords. The system enables users to quickly give relevance feedbacks to the system to narrow down to related concepts and relevant URLS. The idea here is to bootstrap the user quickly with other related concepts that might be glear P(TagURL) usage of related tags. Also, the popularities of various URLs are suggestive of the best information sources to consult, which the user can use as navigational signposts For a tag, we want to also compute the probability that a THE TAGSEARCH ALGORITHM URL is related to it. This probability, p(URLtag), can be Here we describe an algorithm called TagSearch that uses estimated by dividing the number of times an URL is the relationships between tags and documents to suggest tagged with a particular tag divided by the total number of other tags and documents. Conceptually, given a particular times the URL is tagged tag, for tag suggestion, we want to construct a semantic similarity graph as in Figure 1 A sketch of the idea behind the algorithm is as follows Semantic Similarity Graph To suggest tags. (1)What we want to do is then form a tag profile" for a Web tag, which is the set of other tags that are related to the tag Tools To compute the tag profiles, we use the bigraph to perform a spreading activation to find a pattern of other tags that are Reference related to a set of tags. Once we have the tag profiles, we can find other tags that are related by comparing these tag Guide That is. for a Its tag Howto profile to other tag profiles in the system to find the top most related tags Tutorial 2)Another way to do the same thing is to form a Tips document profile" for a tag, which is the set of other documents that are related to the tag, similarly using Tutorials spreading activation. We can then find other tags that are related using these document profile Tricks To suggest documents (3)We can form"tag Figure 1: Conceptual Semantic Similarity Relationships set of other tags that are related to that document, again between Tags using the spreading activation method. We
3 organized to provide meaningful navigation structures, and, consequently, can be viewed as an external representation of what the users learned from a page and of how they chose to organize that knowledge. Using social tagging data as “navigational advice” and suggestions for additional vocabulary terms, we are interested in designing exploratory search systems that could help novice users gain knowledge in a topic area more quickly. However, one problem is that the social cues given by people are inherently noisy. Social tagging generates vast amounts of noise in various forms, including synonyms, misspellings, and other linguistic morphologies, as well as deliberate spam [4]. In past research, we showed that extracting patterns within such data becomes more and more difficult as the data size grows [4]. This research shows that an information theoretic analysis of tag usage in del.icio.us bookmarks suggest of decreased efficiency in using tags as navigational aids [4]. To combat noisy patterns in tags, we have designed a system using probabilistic networks to model relationships between tags, which are treated as topic keywords. The system enables users to quickly give relevance feedbacks to the system to narrow down to related concepts and relevant URLs. The idea here is to bootstrap the user quickly with other related concepts that might be gleamed from social usage of related tags. Also, the popularities of various URLs are suggestive of the best information sources to consult, which the user can use as navigational signposts. THE TAGSEARCH ALGORITHM Here we describe an algorithm called TagSearch that uses the relationships between tags and documents to suggest other tags and documents. Conceptually, given a particular tag, for tag suggestion, we want to construct a semantic similarity graph as in Figure 1. Figure 1: Conceptual Semantic Similarity Relationships between Tags. Of course, for a URL, if we want to suggest other URLs, we also want to form a similar semantic similarity graph, which is mediated by the tags. There are also the cases where, given a tag, we want to suggest URLs, and vice versa. In our approach, the idea is to first form a bigraph between document and tagging pairs. Each tagging data in a tuple specifies a linking relationship between a tag and a document object. For each URL, we want to know the probability of a particular tag being relevant to that URL, and vice versa. For a URL, the probability p(Tag|URL) can be roughly estimated by the number of times a particular tag is applied by users divided by total number of times all tags are used for a URL. Figure 2 depicts this bi-graph. Figure 2: bigraph between document/tag. For a tag, we want to also compute the probability that a URL is related to it. This probability, p(URL|tag), can be estimated by dividing the number of times an URL is tagged with a particular tag divided by the total number of times the URL is tagged. A sketch of the idea behind the algorithm is as follows. To suggest tags: (1) What we want to do is then form a “tag profile” for a tag, which is the set of other tags that are related to the tag. To compute the tag profiles, we use the bigraph to perform a spreading activation to find a pattern of other tags that are related to a set of tags. Once we have the tag profiles, we can find other tags that are related by comparing these tag profiles. That is, for a given tag, we can compare its tag profile to other tag profiles in the system to find the top most related tags. (2) Another way to do the same thing is to form a “document profile” for a tag, which is the set of other documents that are related to the tag, similarly using spreading activation. We can then find other tags that are related using these document profiles. To suggest documents: (3) We can form “tag profiles” for a document, which is the set of other tags that are related to that document, again using the spreading activation method. We can then
compare these tag profiles for documents to other document We can do this, for example, using the bigraph matrix tag profiles to find similar documents constructed in step I (4)We can form" document profiles"for a document using After"n steps (which can be varied basedon the spreading activation method over the bigraph. We experimentation), depending on whether the spreading compare these document profiles for documents to find activation was stopped on the tag side of the bigraph or the similar document document side of the bigraph, we will have a pattern of weights on tags or documents. These patterns of weights form the“ tag profiles the conceptual ideas behind the process is depicted in Figure 4 turn to the specific steps of the algorithm. Ta h is done using a multi-step process (Step 1)First, we construct a bigraph between URLs and 0 tagging keywords. Bookmarks in these systems are typically of the form [url, tagl, tag2, tag3, tag4,..]. We can decompose/transform them into the form [url, taglI [ url, tag2], and so on Given tuples in the form [ url, tag l, we can form a bigraph of D2 D3 D4 D5 Al A[2 URLs linked to tags. This bigraph can be expressed as a T1(100 001 matrix. This process is depicted in Figure 3 731101010 2 [URL2, TAG2, TAG3 75(00011/0 [URL4, TAG3, TAG5 D1/111001 D2|01100 n=3 [URL1,TAG1 [URL1, TAG2[URL1, TAG3 TAG2 0 URL3, TAG4] [URL4, TAG3]URL4, TAG5 D500001 A[4 711 7311010 75(00011人0 Figure 4. Spreading Activation of the tag/document bigraph. Spreading activation have been used in many other systems for modeling concepts that might be related, or to model r40010 traffic flow through a website [5]. In this case, we use spreading activation to model tag and concept co Figure 3: encoding of the tag/document relationships into a Specifically, the tag profiles and document are computed using spreading activation iteratively as vectors A Step 2)Next, we construct"tag profiles"and"document follows profiles" for each URL and each tag in the system. For each URL and tag in the bigraph, we perform a spreading activation using that node in the bigraph as the entry node A/2)=aM*A//+BE: A/n/=aM*A/n-11+Bi
4 compare these tag profiles for documents to other document tag profiles to find similar documents. (4) We can form “document profiles” for a document using the spreading activation method over the bigraph. We compare these document profiles for documents to find similar documents. Steps Having described the conceptual ideas behind the algorithm, we now turn to the specific steps of the algorithm. TagSearch is done using a multi-step process: (Step 1) First, we construct a bigraph between URLs and tagging keywords. Bookmarks in these systems are typically of the form [url, tag1, tag2, tag3, tag4, ….]. We can decompose/transform them into the form [url, tag1], [url, tag2], and so on. Given tuples in the form [url, tag], we can form a bigraph of URLs linked to tags. This bigraph can be expressed as a matrix. This process is depicted in Figure 3. Figure 3: encoding of the tag/document relationships into a bigraph matrix. (Step 2) Next, we construct “tag profiles” and “document profiles” for each URL and each tag in the system. For each URL and tag in the bigraph, we perform a spreading activation using that node in the bigraph as the entry node. We can do this, for example, using the bigraph matrix constructed in step 1. After “n” steps (which can be varied based on experimentation), depending on whether the spreading activation was stopped on the tag side of the bigraph or the document side of the bigraph, we will have a pattern of weights on tags or documents. These patterns of weights form the “tag profiles” or “document profiles”. This process is depicted in Figure 4. Figure 4. Spreading Activation of the tag/document bigraph. Spreading activation have been used in many other systems for modeling concepts that might be related, or to model traffic flow through a website [5]. In this case, we use spreading activation to model tag and concept cooccurrences. Specifically, the tag profiles and document are computed using spreading activation iteratively as vectors A as follows: A[1] = E; A[2] = αM * A[1] + βE; …. A[n] = αM * A[n-1] + βE;