Tag-Based User Profiling for Social Media Recommendation Chia-Chuan Hung and Yi-Ching Huang and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute of Networking and Multimedia National Taiwan University (r95944001, 195045, yjhsu]@csie ntu.edu.tw a person may be profiled as a vector of attributes of his/her Making recommendations for social media presents online personal profiles including the name, affiliation, and special challenges. As tagging becomes common prac interests. Such simple factual data provide an inadequate nany social media sites, this research proposes description of the individual, as they are often incomplete. a new approach to user profiling based on the tags as- nostly subjective and cannot reflect dynamic changes. In sociated with ones personal collection of contents. To collaborative filtering( Goldberg et al. 1992), a person is utilize the social interaction implied by tagging, a per filed by a vector of ratings, one for each media content. sonal profile can be further extended with the tags spec bserving that the rich online media collected by an indi ified by one's social contacts. A tag-to-tag de fined to enable collaborative filtering-style recommen- vidual provide important insights about the person, can we capitalize on such data in the absence of rating information? dations without explicit user ratings. Experiments with collections of bookmarks and the associated tags from This research explores the role of tagging for social media 42, 463 users are presented and compared using the dif recommendation. We propose a new approach to user profi ing based on the tags associated with the the user's personal collection of social media. In particular, a user is profiled by aggregating the tags specified by the user as well as his/her Introduction social contacts The phenomenal rise of social media in recent years is trans- In what follows, we will start by briefly reviewing related forming the average people from content readers to con- research in both recommender systems and user profiling. ent publishers Some popular social media services in The concept of tag-based user profiling is introduced with clude del icio. us'(social bookmarking), last. fm-(social mu a set-theoretic definition. The tag-to-tag matrix is where people share a variety of media contents with their dation. This paper outlines our experiments with del icio.us friends or the general public. Tagging is commonly used to bookmarks and tags, presents the results of tagging-based dd comments or descriptions about the media contents, or user profiles, and compares the recommendations due to per- to help organize and retrieve relevant items sonal view and social view Making recommendations for social media presents spe cial challenges. First of all, the colossal set of user-generated Related work content is open-ended and rapidly growing, making it diffi cult to define the vector space for recommenders. Secondly, Recommender Systems Recommender systems has been user feedbacks are mostly implicit and asymmetrical. For an active area of research and practical applications(Ado- navicius Tuzhilin 2005). The approaches vary widely in example, adding a bookmark to my personal collection in terms the type of information considered in making the rec- dicates my interests in the topic as well as the intention to ommendations access the page in the future. On the other hand, the fact A popularity-based approach recommends the that a bookmark is not in my collection does not necessarily most popular resources, e.g. top music charts. It does not represent a lack of interest. Thirdly, most explicit feedbacks take the attributes from individual or content into con- are only binary Recommendations are generally made based on measures of similarity between people, contents and their interactions. A one-dimensional approach makes recommendations based on the attributes of either the people or the media Copyright@ 2008, Association for the Advancement of Artificial contents independently. For example, content-based recom- Intelligence(www.aaai.org).alLrightsreserved mender systems usually analyze the content of items previ ously rated by a given user to build a model of the user's /www.last.fm/ interests. Relevant items can then be recommended based on the trained user interests model A two-dimensional approach makes
Tag-Based User Profiling for Social Media Recommendation Chia-Chuan Hung and Yi-Ching Huang and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute of Networking and Multimedia National Taiwan University {r95944001, r95045, yjhsu}@csie.ntu.edu.tw Abstract Making recommendations for social media presents special challenges. As tagging becomes common practice at many social media sites, this research proposes a new approach to user profiling based on the tags associated with one’s personal collection of contents. To utilize the social interaction implied by tagging, a personal profile can be further extended with the tags specified by one’s social contacts. A tag-to-tag matrix is de- fined to enable collaborative filtering-style recommendations without explicit user ratings. Experiments with collections of bookmarks and the associated tags from 42,463 users are presented and compared using the different views. Introduction The phenomenal rise of social media in recent years is transforming the average people from content readers to content publishers. Some popular social media services include del.icio.us1 (social bookmarking), last.fm2 (social music), flickr3 (photo sharing), and YouTube4 (video sharing), where people share a variety of media contents with their friends or the general public. Tagging is commonly used to add comments or descriptions about the media contents, or to help organize and retrieve relevant items. Making recommendations for social media presents special challenges. First of all, the colossal set of user-generated content is open-ended and rapidly growing, making it diffi- cult to define the vector space for recommenders. Secondly, user feedbacks are mostly implicit and asymmetrical. For example, adding a bookmark to my personal collection indicates my interests in the topic as well as the intention to access the page in the future. On the other hand, the fact that a bookmark is not in my collection does not necessarily represent a lack of interest. Thirdly, most explicit feedbacks are only binary. Recommendations are generally made based on measures of similarity between people, contents and their interactions. Copyright c 2008, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1 http://del.icio.us 2 http://www.last.fm/ 3 http://www.flickr.com/ 4 http://www.youtube.com/ A person may be profiled as a vector of attributes of his/her online personal profiles including the name, affiliation, and interests. Such simple factual data provide an inadequate description of the individual, as they are often incomplete, mostly subjective and cannot reflect dynamic changes. In collaborative filtering (Goldberg et al. 1992), a person is profiled by a vector of ratings, one for each media content. Observing that the rich online media collected by an individual provide important insights about the person, can we capitalize on such data in the absence of rating information? This research explores the role of tagging for social media recommendation. We propose a new approach to user profiling based on the tags associated with the the user’s personal collection of social media. In particular, a user is profiled by aggregating the tags specified by the user as well as his/her social contacts. In what follows, we will start by briefly reviewing related research in both recommender systems and user profiling. The concept of tag-based user profiling is introduced with a set-theoretic definition. The tag-to-tag matrix is defined, followed by the process of making social media recommendation. This paper outlines our experiments with del.icio.us bookmarks and tags, presents the results of tagging-based user profiles, and compares the recommendations due to personal view and social view. Related Work Recommender Systems Recommender systems has been an active area of research and practical applications (Adomavicius & Tuzhilin 2005). The approaches vary widely in terms the type of information considered in making the recommendations. A popularity-based approach simply recommends the most popular resources, e.g. top music charts. It does not take the attributes from individual user or content into consideration. A one-dimensional approach makes recommendations based on the attributes of either the people or the media contents independently. For example, content-based recommender systems usually analyze the content of items previously rated by a given user to build a model of the user’s interests. Relevant items can then be recommended based on the trained user interests model. A two-dimensional approach makes recommendations
based on the relationships between the people and items with its location, the event, people and objects in the picture, (media contents). For example, collaborative filtering rec- color or mood depicted in the picture. Tagging associates an ommender systems( Goldberg et al. 1992)collect all users (e.g. a picture, a web page etc. ) with ratings about all items, and make recommendations based on which represent the semantic concepts activated by the ob- the previous ratings from the group of people who have sin ject at the cognitive level. Tagging provides a simple yet ilar tastes with the given user. Collaborative filtering suffers powerful way for organizing, retrieving and sharing differ from start-up problem, so variations to compute item-item, ent types of social media user-user and user-item similarity have been proposed While categorization is a primarily subjective decision ing user by tags will benefit inferring knowledge about a 2006), Sinha succinctly pointed out that"Tagging captures user.( Firan, Nejdl, Paiu 2007)utilize rich genres and our individual conceptual associations, but does not force us user data on the music community site, Last. fm, to identify to categorize. It enables loose coordination, but does not en- and search algorithms. A simple tag analyzing method a users' preferred music genres. They define several types force the same interpretation of a conce pt. We could all tag tag-based user profile and their corresponding recommen items as but mean very different things. That would proposed to consider public tags and tagging frequency on a social tagging system. In addition, Sinha offered the fol track owned by a user to determine relevant tags and their lowing insightful oberservations ssociated scores. Follow the idea of collaborative filter. ng recommender system, they find some similar users from Tagging transforms web browsing from a solitary to a so a user-tag matrix, and recommend music pieces containing cial experience. Tagging specific resources creates ad-hoc the similar users'tags. Compare with conventional track groups, leading to"wisdom of crowd based recommender approach as a baseline, their experiment Tagging enables social coordination that is simultane shows that tag-based user profile significantly improves the usly more direct and abstract than collaborative filter quality of results. ing, as tags connects entities directly and enables tranfer of conceptual information. ofiles from social networking websites, such as Friend- Social Media Network User Profiles Research in (Liu Maes 2005) harvests ster, MySpace, and Orkut, to construct the InterestMap, a network-style user profile to illustrate the relationship be a social media network to be a heterogeneous network of tween interests and identities. Unlike traditional recom- people and their(common) media collection. For exam mender systems, the proposed approach recommends by ple, given the social network of John and his four friends considering the interests of people instead of their historical denoted by the circles and solid lines. Figure I shows the behavior in a particular application. The InterestMap pro- network with John's del icio us bookmarks, denoted by the duces more accurate recommendations, and the preferences oval-shaped nodes with dotted directed arrows. It is interest o viSual fashion ple in real life are modeled in an intuitive ing to note that while users Pr and Py are not John's social contacts, they are"related to"John via the common book The idea of constructing user profiles from tagging data mark url 8 has been proposed in(Michlmayr Cayzer 2007). They use a profile graph to represent a user where nodes as tags URL 6 used by this user and edges as the relations between tags. They design a Add A Tag algorithm, an adaptive approach that combines co-occurrence and temporal information for determining the edge weight between a pair of tags. They also provide a graph animation to visualize a namic usel URL_9 X UR profile. Their user study shows that users still desire to see long-term tag relationships(which are identified by tradi- tional co-occurrence method) in their profile. However they also appreciated that Add-A-Tag adapts better to show re Pete Tagging-Based User Profiling Most social media sites support tagging mechanism For ex Figure 1: A sample social media network ample, bookmarks on del icio us may be tagged with the top cs of interest to the user; a picture on Flickr may be tagged http://www.last.fm/ Tagging as Profiles bhttp://www.friendster.com Instead of the vector of item ratings, each user is profiled con as a set of tags and their weights. The profiling tags can be harvested from multiple data sources below
based on the relationships between the people and items (media contents). For example, collaborative filtering recommender systems (Goldberg et al. 1992) collect all users’ ratings about all items, and make recommendations based on the previous ratings from the group of people who have similar tastes with the given user. Collaborative filtering suffers from start-up problem, so variations to compute item-item, user-user and user-item similarity have been proposed. Some researches on user modeling also proposed profiling user by tags will benefit inferring knowledge about a user. (Firan, Nejdl, & Paiu 2007) utilize rich genres and user data on the music community site, Last.fm5 , to identify users’ preferred music genres. They define several types of tag-based user profile and their corresponding recommender and search algorithms. A simple tag analyzing method is proposed to consider public tags and tagging frequency on a track owned by a user to determine relevant tags and their associated scores. Follow the idea of collaborative filtering recommender system, they find some similar users from a user-tag matrix, and recommend music pieces containing the similar users’ tags. Compare with conventional trackbased recommender approach as a baseline, their experiment shows that tag-based user profile significantly improves the quality of results. User Profiles Research in (Liu & Maes 2005) harvests profiles from social networking websites, such as Friendster6 , MySpace7 , and Orkut8 , to construct the InterestMap, a network-style user profile to illustrate the relationship between interests and identities. Unlike traditional recommender systems, the proposed approach recommends by considering the interests of people instead of their historical behavior in a particular application. The InterestMap produces more accurate recommendations, and the preferences and interests of people in real life are modeled in an intuitive and visual fashion. The idea of constructing user profiles from tagging data has been proposed in (Michlmayr & Cayzer 2007). They use a profile graph to represent a user where nodes as tags used by this user and edges as the relations between tags. They design a Add A Tag algorithm, an adaptive approach that combines co-occurrence and temporal information for determining the edge weight between a pair of tags. They also provide a graph animation to visualize a dynamic user profile. Their user study shows that users still desire to see long-term tag relationships (which are identified by traditional co-occurrence method) in their profile. However they also appreciated that Add-A-Tag adapts better to show recent changes. Tagging-Based User Profiling Most social media sites support tagging mechanism. For example, bookmarks on del.icio.us may be tagged with the topics of interest to the user; a picture on Flickr may be tagged 5 http://www.last.fm/ 6 http://www.friendster.com 7 http://www.myspace.com 8 http://www.orkut.com with its location, the event, people and objects in the picture, color or mood depicted in the picture. Tagging associates an object (e.g. a picture, a web page etc.) with a set of words, which represent the semantic concepts activated by the object at the cognitive level. Tagging provides a simple yet powerful way for organizing, retrieving and sharing different types of social media. While categorization is a primarily subjective decision process, tagging is a social indexing process. In (Sinha 2006), Sinha succinctly pointed out that “Tagging captures our individual conceptual associations, but does not force us to categorize. It enables loose coordination, but does not enforce the same interpretation of a concept. We could all tag items as ‘art’ but mean very different things. That would create chaos in a shared folder scheme, but works well in a social tagging system.” In addition, Sinha offered the following insightful oberservations. • Tagging transforms web browsing from a solitary to a social experience. Tagging specific resources creates ad-hoc groups, leading to “wisdom of crowds”. • Tagging enables social coordination that is simultaneously more direct and abstract than collaborative filtering, as tags connects entities directly and enables tranfer of conceptual information. Social Media Network To explore the role of tagging for social media, we define a social media network to be a heterogeneous network of people and their (common) media collection. For example, given the social network of John and his four friends, denoted by the circles and solid lines. Figure 1 shows the network with John’s del.icio.us bookmarks, denoted by the oval-shaped nodes with dotted directed arrows. It is interesting to note that while users Px and Py are not John’s social contacts, they are “related to” John via the common bookmark URL 8. Figure 1: A sample social media network. Tagging as Profiles Instead of the vector of item ratings, each user is profiled as a set of tags and their weights. The profiling tags can be harvested from multiple data sources below
All data and descriptions in the registered user profile tag-based user profiles. Thus the value of T2Tli] is the This source of information ranges from the bare minimal proportion of the people with both tags i and j within the e.g. only name and homepage URL for del icio us, to rich people with tag i descriptions as in many social Making Recommendation explicitly by his/her friends to describe the given user. Given a person and a social media item, we measure tag Tags associated with the user's collection of social media, based user profile and T2T matrix to decide a recommend which reflect his/her topics of interest as well as activities tion. For a user u and a piece of content c, We define one of us user attribute, tag ut; to be a feature vector, and elements In this paper, we firstly consider tags on social media con- in this vector correspond to c's content attributes. Suppose tent this piece of content has k attributes, then the length of each The personal view of a user profile considers only ta feature vector is k specified by the user, while the social view of a user profile includes tags specified by his/her friends on his/her collec- feature;=o(ct1),o(ct2),., (ctk) ion of contents. Let P be the set of all people, and R be the set of all resources. We can define P(ta)to be the set of where people who have tag I in their user profiles, that is, it defines o(ct,)= Tag Weight[ut:]x T2TliL1(3) the set of people who share the tag in common. Similarly, we may define the set of people P(Tu) who has a specific Tag Weightuti is the weight of user tag i in this user's tag resource y in their personal collection. Given a person P, his based user profile social view is the aggregation of the tag set of his/her per- sonal collection R(p) with the tag set defined by people who 9. Then we integrate the feature vectors between different ibutes to obtain a score for judging the recommenda also collect the resource P(R(p)and are friends with p tion for a content attribute. we choose the maximum fea ture value among this user's feature vectors Instead of using Social media Recommendation other operations, e. g summation, we only consider the mos From a user profile, we can list some attributes(as tags)of relevant user attribute to avoid summing up many irrelevant this person. For example, he is good at"programming",is attributes as a high feature value fond of"travel", and is a"humor"guy. Some attributes are common, but some may be particularly owned by an individ- e(ct, )=x /featurelutill ct,l ual. Aggregating the attributes of all individuals, we obtain a person attribute set. We can also list the attributes of where N is the total number of tags in this users tag-based piece of content. It is a"wiki" page about"design, and it user profile ncludes many"cool"ideas. We obtain a content attribute Thus the overall user feature vector can be defined as set by gathering these content attributes. Traditional recom- mender system can infer the relationship between user pref- user-vector=((ct1), e(ct2),., e(ctk))(5 ence and item attribute from rating data. In our proposed Here we simply sum up every feature value to obtain a rec social media recommendation, we obtain the relationship ommendation score. If this score is above a threshold t. we by analyzing these attributes which are represented as tags. decide to recommend the content Tagging-based relationship abstracts the general semantic between user and item, avoiding overly-specific problem in Experiments with Social Bookmarking traditional recommender system The proposed idea can be applied to any type of social me Tag-to- Tag Matrix dia with tags. In our experiment, for simplicity, social book marks are used as the data source and each bookmarked We define a tag-to-tag (T2T)matrix to record the relevance document is assumed to have multiple tags between a pair of user tag(person attribute)and content tag (content attribute). In the T2T matrix, each row represents The Data one user tag in person attribute set, while each column rep esents one content tag in content attribute set a higher val Del icio us is a popular social bookmarking website, and led relevance means the corresponding pair of tags is highly contains rich and public personal bookmark collection. Our relevant. For example, people who like""art"are interested analysis is applied on two sets of del icio us data. One is in the content about"design". Thus, person attribute"art "URL data set, and the other is "USER". To ensure we have and content attribute"design "are relevant. We calculate the enough tagging data, we set some conditions to filter our value in each field of the T2T matrix using Equation(1) collected data. We randomly sample 65, 131 users who have 300 to 1500 bookmarks. There are 7, 258, 267 unique URLS users' bookmarks. and then we choose those P(uti) URLS have been bookmarked by 70 to 200 people Thus 42,844 URLs are left. and we name this set of dat where uti and ct, represent user tag i and content tag j re- as"URL"set. Finally, trace back to examine each user's spectively. P(tz)is a set of people who have tag z in their bookmarking data, and only someone who has at least
• All data and descriptions in the registered user profile. This source of information ranges from the bare minimal, e.g. only name and homepage URL for del.icio.us, to rich descriptions as in many social networking sites. • Tags specified by the user for self description, or tags used explicitly by his/her friends to describe the given user. • Tags associated with the user’s collection of social media, which reflect his/her topics of interest as well as activities. In this paper, we firstly consider tags on social media content. The personal view of a user profile considers only tags specified by the user, while the social view of a user profile includes tags specified by his/her friends on his/her collection of contents. Let P be the set of all people, and R be the set of all resources. We can define P(tx) to be the set of people who have tag x in their user profiles, that is, it defines the set of people who share the tag in common. Similarly, we may define the set of people P(ry) who has a specific resource y in their personal collection. Given a person p, his social view is the aggregation of the tag set of his/her personal collection R(p) with the tag set defined by people who also collect the resource P(R(p)) and are friends with p. Social Media Recommendation From a user profile, we can list some attributes (as tags) of this person. For example, he is good at “programming”, is fond of “travel”, and is a “humor” guy. Some attributes are common, but some may be particularly owned by an individual. Aggregating the attributes of all individuals, we obtain a person attribute set. We can also list the attributes of a piece of content. It is a “wiki” page about “design”, and it includes many “cool” ideas. We obtain a content attribute set by gathering these content attributes. Traditional recommender system can infer the relationship between user preference and item attribute from rating data. In our proposed social media recommendation, we obtain the relationship by analyzing these attributes which are represented as tags. Tagging-based relationship abstracts the general semantic between user and item, avoiding overly-specific problem in traditional recommender system. Tag-to-Tag Matrix We define a tag-to-tag (T2T) matrix to record the relevance between a pair of user tag (person attribute) and content tag (content attribute). In the T2T matrix, each row represents one user tag in person attribute set, while each column represents one content tag in content attribute set. A higher valued relevance means the corresponding pair of tags is highly relevant. For example, people who like “art” are interested in the content about “design”. Thus, person attribute “art” and content attribute “design” are relevant. We calculate the value in each field of the T2T matrix using Equation (1) T2T[i][j] = |P(uti) T P(ctj )| |P(uti)| (1) where uti and ctj represent user tag i and content tag j respectively. P(tx) is a set of people who have tag x in their tag-based user profiles. Thus the value of T2T[i][j] is the proportion of the people with both tags i and j within the people with tag i. Making Recommendation Given a person and a social media item, we measure tagbased user profile and T2T matrix to decide a recommendation. For a user u and a piece of content c, We define one of u’s user attribute, tag uti to be a feature vector, and elements in this vector correspond to c’s content attributes. Suppose this piece of content has k attributes, then the length of each feature vector is k. feature[uti ] = hφ(ct1), φ(ct2), . . . , φ(ctk)i (2) where φ(ctj ) = TagWeight[uti ] × T2T[i][j] (3) TagWeight[uti ] is the weight of user tag i in this user’s tagbased user profile. Then we integrate the feature vectors between different attributes to obtain a score for judging the recommendation. For a content attribute, we choose the maximum feature value among this user’s feature vectors. Instead of using other operations, e.g. summation, we only consider the most relevant user attribute to avoid summing up many irrelevant attributes as a high feature value. Θ(ctj ) = max ∀i∈N feature[uti ][ctj ] (4) where N is the total number of tags in this user’s tag-based user profile. Thus the overall user feature vector can be defined as: user vector = hΘ(ct1), Θ(ct2), . . . , Θ(ctk)i (5) Here we simply sum up every feature value to obtain a recommendation score. If this score is above a threshold T, we decide to recommend the content. Experiments with Social Bookmarking The proposed idea can be applied to any type of social media with tags. In our experiment, for simplicity, social bookmarks are used as the data source, and each bookmarked document is assumed to have multiple tags. The Data Del.icio.us is a popular social bookmarking website, and contains rich and public personal bookmark collection. Our analysis is applied on two sets of del.icio.us data. One is “URL” data set, and the other is “USER”. To ensure we have enough tagging data, we set some conditions to filter our collected data. We randomly sample 65,131 users who have 300 to 1500 bookmarks. There are 7,258,267 unique URLs among these users’ bookmarks, and then we choose those URLs which have been bookmarked by 70 to 200 people. Thus 42,844 URLs are left, and we name this set of data as “URL” set. Finally, trace back to examine each user’s bookmarking data, and only someone who has at least 50
bookmarked URLs within our URL Set would belong to our Tagging Popularity We sum up the weights provided by ""USER"set. There are 42. 643 users in the USER set the people who has the same tag on the same URL, and nor- For each URL and user in both sets, we log their com- malized by the bookmarked number of this URl to deter plete tagging history. There are 34, 427 distinct common mine the importance of a tag t for this document r. We de tags(tagged by at least two people, from the statistic of fine P(r, t) as a set of people who ever assigned document del icio. us)in URL set (we name these tags as URL tags). r with tag t, and P(r)l is the amount of people who ever For each user, we choose top 30 tags from his/her personal bookmarked document r. Thus combine with Equation(6 user profile. Thus totally 28, 290 distinct tags (USER tags) the capacity of t on r is are included. Table 1 shows a data acity(t, r)=P(X∑ma(tp( Amount URL 42. 844112.556 people bookmarked 34. User Profile URLs The importance of a tag to an individual, denoted as Tag USER‖4643 94. 718 bookmarks (per user) 28.290 Weight(as the notation in previous section), is determined by analyzing its capacity, and the volume of covered book marks. The results are different from two viewpoints al Table 1: Summary of two data sets though we are describing the same person.(Figure 2) We also collect each user's personal social net Personal View Suppose a person p owns a set of doc. data for producing social user profile. There are 12, 794 users uments R() which he/she bookmarked. From the per who have at least one friend in USER set(2.89 friends in sonal viewpoint, we only consider the tags assigned by p, average). For these users, we also choose top 30 tags from and denote these tags as a set T(R(P), P). For each tag their social user profiles. Totally 11, 600 distinct tags (User tET(R(P), P), we define the Tag Weight of t from p,s view tags)are included. Tag Analysis ma(tp)=2∈R)pciy(,r) R(P) Each bookmarked URL is given one or more tags to describe the content of the webpage. We define a value, capacity Thus p's personal tag-based user profile can be defined to represent how much a tag can describe the content of a Profilepersonal (p)=((t1, w1), (t2, w2), ...(9) document. By analyzing the nature and idea of a tag, the where, E T(R(P), p) and w; is calculated by Equation( 8) popularity of an identical tag to the same content, and the tagging order, we can determine the capacity of a tag Social View From the social viewpoint, we consider the ags assigned by p's social contacts, A(P). Note that we Tagging Order Research on users'tagging patterns still focus on documents in R(P), but strictly consider docu- Golder Huberman 2006)discovered that the first tag ments which are bookmarked both by p and by his/her social used has the highest median rank(i.e. greatest frequency), contacts, denoted as Rsocial=R(PnR(A(p)Our con- nd successive tags have a decreasing median rank. We sidered tags are those tags which were assigned by p's social exploit this idea and assume the first tag is more relevant contacts on these documents, denoted as T(Rsocial, A(p)) than the second tag on a bookmark. let r denotes a book- For each t E T(Rsocial, A(P), its weight from social view mark collection and T(R) denotes a set of tags which are can be defined as assigned on R by many people. A tuple of bookmarking data is denoted as b=(p, T(r, p), r)which means Tag Weight social(t, P) 2HrER,oetal capacity(t, r) son p who tagged document r E R with a sequence of (10) weight of tag t; E T(r, p)as tn]. We firstly define the order- Then we can obtain a user tagging profile from social viev Profile social (p)=((t1, w1), ( t2, w2),... (11) xp-10ifi≤10 where ti T(Rsocial, A(p)) and w; is calculated by Equa- Worder(ti, P) (r,p) exp ifi>10(6) tion where i is the index of t; in this ordered tagging sequence and T(r, p)l is a normalization term. Here we let tags af- In this section, we describe some observations from the re ter the 10n tag have equal order weight. Exponential de sults of our experiment. We randomly select 10 users and creasing function is applied because it is more easily imple depict their personal and social tag-based user profiles
bookmarked URLs within our URL set would belong to our “USER” set. There are 42,643 users in the USER set. For each URL and user in both sets, we log their complete tagging history. There are 34,427 distinct common tags (tagged by at least two people, from the statistic of del.icio.us) in URL set (we name these tags as URL tags). For each user, we choose top 30 tags from his/her personal user profile. Thus totally 28,290 distinct tags (USER tags) are included. Table 1 shows a summary of our experiment data. Set Amount Average Tags 42,844 112.556 people bookmarked URL URLs (per URL) 34,427 42,643 94.718 bookmarks USER users (per user) 28,290 Table 1: Summary of two data sets We also collect each user’s personal social networking data for producing social user profile. There are 12,794 users who have at least one friend in USER set (2.89 friends in average). For these users, we also choose top 30 tags from their social user profiles. Totally 11,600 distinct tags (USER tags) are included. Tag Analysis Each bookmarked URL is given one or more tags to describe the content of the webpage. We define a value, capacity, to represent how much a tag can describe the content of a document. By analyzing the nature and idea of a tag, the popularity of an identical tag to the same content, and the tagging order, we can determine the capacity of a tag. Tagging Order Research on users’ tagging patterns (Golder & Huberman 2006) discovered that the first tag used has the highest median rank (i.e. greatest frequency), and successive tags have a decreasing median rank. We exploit this idea and assume the first tag is more relevant than the second tag on a bookmark. Let R denotes a bookmark collection and T(R) denotes a set of tags which are assigned on R by many people. A tuple of bookmarking data is denoted as b = (p, T(r, p), r) which means a person p who tagged document r ∈ R with a sequence of tags T(r, p) = {t1, t2, · · · , tn}. We firstly define the orderweight of tag ti ∈ T(r, p) as worder(ti , p) = 1 |T(r, p)| × exp−i/10 if i ≤ 10 exp−1 if i > 10 (6) where i is the index of ti in this ordered tagging sequence, and |T(r, p)| is a normalization term. Here we let tags after the 10th tag have equal order weight. Exponential decreasing function is applied because it is more easily implemented, rather than defining a linear decreasing function. Tagging Popularity We sum up the weights provided by the people who has the same tag on the same URL, and normalized by the bookmarked number of this URL to determine the importance of a tag t for this document r. We de- fine P(r, t) as a set of people who ever assigned document r with tag t, and |P(r)| is the amount of people who ever bookmarked document r. Thus combine with Equation (6), the capacity of t on r is capacity(t, r) = 1 |P(r)| × X p∈P(r,t) worder(t, p) (7) User Profile The importance of a tag to an individual, denoted as TagWeight (as the notation in previous section), is determined by analyzing its capacity, and the volume of covered bookmarks. The results are different from two viewpoints although we are describing the same person. (Figure 2) Personal View Suppose a person p owns a set of documents R(p) which he/she bookmarked. From the personal viewpoint, we only consider the tags assigned by p, and denote these tags as a set T(R(p), p). For each tag t ∈ T(R(p), p), we define the TagWeight of t from p’s view as TagWeightpersonal(t, p) = P ∀r∈R(p) capacity(t, r) |R(p)| (8) Thus p’s personal tag-based user profile can be defined as: Profilepersonal(p) = {ht1, w1i,ht2, w2i, · · ·} (9) where ti ∈ T(R(p), p) and wi is calculated by Equation (8). Social View From the social viewpoint, we consider the tags assigned by p’s social contacts, A(p). Note that we still focus on documents in R(p), but strictly consider documents which are bookmarked both by p and by his/her social contacts, denoted as Rsocial = R(p) T R(A(p)) Our considered tags are those tags which were assigned by p’s social contacts on these documents, denoted as T(Rsocial, A(p)). For each t ∈ T(Rsocial, A(p)), its weight from social view can be defined as: TagWeight social(t, p) = P ∀r∈Rsocial capacity(t, r) |Rsocial| (10) Then we can obtain a user tagging profile from social view. Profilesocial(p) = {ht1, w1i,ht2, w2i, · · ·} (11) where ti ∈ T(Rsocial, A(p)) and wi is calculated by Equation (10). Results In this section, we describe some observations from the results of our experiment. We randomly select 10 users and depict their personal and social tag-based user profiles as Figure 2
vdeo use中2|imux orography photo shop f ervoes web2D design 01171025101390353040602170158010176 eaI programming java semanteme b 00681032609501121062501570680119093 01450.152045402230.09901430.128 usea r s01601520311101604023701080140149 oyonrails mangrel!rubycocoacss 02901510237011710540216014400210133 repress flash reas microformats wo dsgn01302902801102311017700740074099 usea design 012801410262101170160323102120001393 gn biag typography css 0.178013102160.1160.1980249036101250284 us ea8 desig illustration weather architecture news 013102704060.1390.10803040193015610142 design Ilu straton opensource01560.13502440123013502830210273096 seg wordpress photogra plry firefox phone programming desgn javascript:art Figure 4: The result of T2T matrix from social viewpoint future ogy ( social USER tags v.S. URL tags). Here we only show top 10 tags from the common USER and URL tags Figure 2: The top 7 tags of tag-based user profiles. For each user, the first row is from personal viewpoint while the sec- ond is from social viewpoint. #People #URL The results of tag-based user profiles from two view art 125187107 e different. Note user#10 give us very different impres- 1733012173 sions after looking his/her personal and social profile. We business calculate symmetric difference between personal and social 180863896 profiles of the 12,794 users, and obtain the average differ 2096914540 ences are 91.2%(only consider top 10 tags)and 95%(con- howto 1105911633 sider the entire tag profile). The results are very different linux 175094274 from two viewpoints although we are describing the same news 1474455 66125933 Results of Tag-to-Tag matrix Table 2: A list of the top 10 tags from the common USER After applying Equation(1), we obtain two tag-to-tag ma and uRl tags trices from personal and social viewpoints, as Figure 3 and Figure 4 depicted respectively. We only show top 10 com- mon tags from USeR and url tags in these two figures. e marked lowest(blue) and highest (red) values in Table 2 lists, for each tag, how many people own this tag, each row. Note that some lowest and highest positions and how much URLs have been tagged on this tag. are changed in different views. Furthermore, all values are lower in social view, The tags in social profile are much less than personal profile, because we strictly define our method for the friends that owned common URLs with sampled user. ;』;!!! However, it is a strong condition, and in fact, only a few cases are satisfied 104530660481079707950592081403780547 0.38710.7304510.5070.848 0.5810.3910.525 0.395 Results of Social Media Recommendation bg049506191048705520.8021059104570555047 bus in ess05521055069 076|051164069 10.5890.83 0.6420.4720.4840478 The scores of Social Media Recommendation are depicted 0.8220584 design 0 0 6B6 0.729052306810.5810.43504610455 view respectively. Ideally, the threshold should be obtained muxo 0.502 06260.7205470.5520783 10659 567 by machine learning techniqes, and adjusted from user feed- 0.5750.4710.6370.4250.5780.6850.659 1 0.4490.584 backs. Here we simplify the process, setting it as the average 0.3530.5570.68404370.390.6720.5070445 0409 of the scores in each row, and then determine whether a doc- opensource06250.4740.7020491057710743065707390491 ument is recommended or not. The decisions for recommen- dation are showed in Figure 7. We marked the inconsistent Figure 3: The result of T2T matrix from personal viewpoint decisions between different views. Note that user#10 has (personal USER tags V.S. URL tags). Here we only show many inconsistencies because his profiles from two view- top 10 tags from the ce USER and URl tags points are very different
Figure 2: The top 7 tags of tag-based user profiles. For each user, the first row is from personal viewpoint while the second is from social viewpoint. The results of tag-based user profiles from two viewpoints are different. Note user#10 give us very different impressions after looking his/her personal and social profile. We calculate symmetric difference between personal and social profiles of the 12,794 users, and obtain the average differences are 91.2% (only consider top 10 tags) and 95% (consider the entire tag profile). The results are very different from two viewpoints although we are describing the same person. Results of Tag-to-Tag Matrix After applying Equation (1), we obtain two tag-to-tag matrices from personal and social viewpoints, as Figure 3 and Figure 4 depicted respectively. We only show top 10 common tags from USER and URL tags in these two figures. Table 2 lists, for each tag, how many people own this tag, and how much URLs have been tagged on this tag. Figure 3: The result of T2T matrix from personal viewpoint (personal USER tags v.s. URL tags). Here we only show top 10 tags from the common USER and URL tags Figure 4: The result of T2T matrix from social viewpoint (social USER tags v.s. URL tags). Here we only show top 10 tags from the common USER and URL tags #People #URL ajax 14344 3664 art 12518 7107 blog 17330 12173 business 7603 5323 css 18086 3896 design 20969 14540 howto 11059 11633 linux 17509 4274 news 8147 4455 opensource 6612 5933 Table 2: A list of the top 10 tags from the common USER and URL tags We marked lowest (blue) and highest (red) values in each row. Note that some lowest and highest positions are changed in different views. Furthermore, all values are lower in social view, The tags in social profile are much less than personal profile, because we strictly define our method for the friends that owned common URLs with sampled user. However, it is a strong condition, and in fact, only a few cases are satisfied. Results of Social Media Recommendation The scores of Social Media Recommendation are depicted on Figure 5 and Figure 6 from personal view and social view respectively. Ideally, the threshold should be obtained by machine learning techniqes, and adjusted from user feedbacks. Here we simplify the process, setting it as the average of the scores in each row, and then determine whether a document is recommended or not. The decisions for recommendation are showed in Figure 7. We marked the inconsistent decisions between different views. Note that user#10 has many inconsistencies because his profiles from two viewpoints are very different
Accuracy loc1 doc-2 doc 3 doc 4 doc_5 doc_6 doc 7 doc_8 doc_9 doc_10 Following traditional accuracy measurment of CF recom- mender system, 5-fold validation is applied on our recom 0530.051 0.073 0.060 0.068 0.056 0.049 0.073 0.070 0.063 mendation results. We randomly re-sample 11, 462 users r4 0320 0304 0494 0370 0478 03780285 0371 and hide their 20% bookmarks to generate their tag-based ra5 0248 0234 0358 0297 0362 0339 0223 0333 0321 0292 user profiles. From the sampled users' bookmark collection, user6 0.163 0.147 0.205 0.208 0203 0.155 0.145 0.180 0.200 0.17 we select 10, 192 documents (URLs)which are bookmarked ser7 0.173 0.137 0.166 0.210 0.166 0.126 0.141 0.155 0.186 0.167 by at least 72 sampled users as the candidates for recom- user#8 0.410 0.324 0.397 0.499 0.395 0.297 0.334 0.361 0444 0.397 mendation Note that the overall bookmark collection of er#9019001640216020020901650167019024302011,462 users includes2,795,303 unique URLS. However, user#100.06000470.0670.062006500510.044006000620.055 because of computing complexity, we can not consider all of them. In fact, the average overlap between a user's book- Figure 5: The score of social media recommendation using marked items and selected candidates is only 20.3%. Then personal tag-based user profiles for each user, according to previously obtained tag-to-tag matrix,these documents are ranked by their recommender cores. The precision curve is depicted as Figure 8 doc 1 doc 2 doc 3 doc 4 0.015001400220.012 606260 0004000400060005000500040.0000050050.004 007001000140.007001300100.00001200120009 user#40011001200230013002200130008002000210013 user#5000700060.0150.00001600200.00700120080.008 ser#60009000700150.013001300070006000900100.009 user#70004000200020.005000300030.003000200020.002 user#8017300890.1030.23601200085009200930.1040.085 user#90019001800310024002300210.017002300290026 user#100002000100010.002000100010001000100010001 Figure 6: The score of social media recommendation using 200 400 1200 Figure 8: The precision curve of social media recommenda- tion using tag-based user profiles doc-1 doc-2 doc-3 doc_4 doc_ 5 doc- 6 doc_7 doc__8 doc_9 doc user=2 As our expected, the precision is very low. The average No precision is 2.77% when only recommend top 100 URLS 41% accuracy is our best result, but interestingly, this user Yes yes No Yes No No YesYes No lustration"and"belgique"in his/her tag-based user profile use- noNo Yes No Yes NoNo YesYes No Because these tags refer to much less documents(compare use=s NoNo Yes No YesYes No with some tags such as“ design”or"“art'” refer to thousand No of documents), it is more easily hitting the ground truth YesYes Yes No The colossal set of user-generated content is open-ended and rapidly growing, and a long tail exists in the social me 1ser#8 Yes No YesYesYes No No No Yes Yes res yes Yes /o No No No o that has some central topics and can be formal defined, the es topics of social media data are very diverse. Furthermore, user=10 NoNo Yes YesYes NoNo Yes Yes No bookmarking activity may depend on the order of informa- tion receiving. For example, if a person see an incomplete iding document at first, he may bookmark it. However Figure 7: The decisions of social media recommendation he/she firstly find a complete tutorial, he may not be in We marked differences between personal and social views terested in the imcomplete guideline. Therefore, we think calculating accuracy is not adequate to represent the perfor- mance, and we will further evaluate our work by user study
Figure 5: The score of social media recommendation using personal tag-based user profiles. Figure 6: The score of social media recommendation using social tag-based user profiles. Figure 7: The decisions of social media recommendation. We marked differences between personal and social views. Accuracy Following traditional accuracy measurment of CF recommender system, 5-fold validation is applied on our recommendation results. We randomly re-sample 11,462 users and hide their 20% bookmarks to generate their tag-based user profiles. From the sampled users’ bookmark collection, we select 10,192 documents(URLs) which are bookmarked by at least 72 sampled users as the candidates for recommendation. Note that the overall bookmark collection of 11,462 users includes 2,795,303 unique URLs. However, because of computing complexity, we can not consider all of them. In fact, the average overlap between a user’s bookmarked items and selected candidates is only 20.3%. Then for each user, according to previously obtained tag-to-tag matrix, these documents are ranked by their recommender scores. The precision curve is depicted as Figure 8. Figure 8: The precision curve of social media recommendation using tag-based user profiles. As our expected, the precision is very low. The average precision is 2.77% when only recommend top 100 URLs. 41% accuracy is our best result, but interestingly, this user has some special (unusual) but important tags such as “illustration” and “belgique” in his/her tag-based user profile. Because these tags refer to much less documents (compare with some tags such as “design” or “art” refer to thousands of documents), it is more easily hitting the ground truth. The colossal set of user-generated content is open-ended and rapidly growing, and a long tail exists in the social media data. Compare with traditional data set (e.g. movie data) that has some central topics and can be formal defined, the topics of social media data are very diverse. Furthermore, bookmarking activity may depend on the order of information receiving. For example, if a person see an incomplete guiding document at first, he may bookmark it. However, if he/she firstly find a complete tutorial, he may not be interested in the imcomplete guideline. Therefore, we think calculating accuracy is not adequate to represent the performance, and we will further evaluate our work by user study
Future Work and Conclusion In this paper, we utilize a list of weight tags as profiles to rec- ommend interesting content to users. We think that profiling a person considers not only tag weight but also semantic re- lationship between tags. Weighted tags can represent user's interested topics and preference degree and tag relationship can represent the relationship between topics. Based on tag relationship, we can improve our result for social media rec ommendation. Recently, we launch on a small-scale user study including fifteen to twenty testers. We plan to recom- mend several web pages based on their tag-based user profile and collect their feedbacks and comments This paper presented our research on tagging-based user profiling for social media recommendation. A user is pre filed based on the tags associated with his/her social media, Is well as the tags on the collection specified by his/her so- cial contacts. We introduced the concept of tagging-based profiling with a set-theoretic definition. The tag-to-tag ma- trix is defined, followed by the process of making social me dia recommendation. This paper presented our experiments with del icio us bookmarks and tags of 42, 643 users filtered with selection criteria to remove outliers. We compared the results of recommendations due to both the personal and so- cial views Re erences Adomavicius, G, and Tuzhilin, A. 2005. Toward the next eneration of recommender systems: A survey of the state of-the-art and possible extensions. IEEE Transactions on nowledge and Data Engineering 17(6): 734-749 Firan, C; Nejdl, W ; and Paiu, R. 2007. The benefit of sing tag-based profiles. In Proceedings of the 2007 Latin American Web Conference(LA-WEB 2007), 32-41. Wash ington, DC, USA: IEEE Computer Society ldberg, D. Nichols, D ; Oki, B. M. and Terry, D. 1992. Using collaborative filtering to weave an informa- tion tapestry. Commun. ACM 35(12): 61-70 Golder, S, and Huberman, B. A. 2006. Usage patterns of collaborative tagging systems. Journal of Information Science32(2:198-208 luang, Y.-C Hung, C.-C. and Hsu, J. Y.-1. 2008. You are what you tag. In Proceedings of AAAl 2008 Spring Symposium Series on Social Information Processing Liu, H, and Maes, P. 2005. Interest Map: Harvesting social etwork profiles for recommendations. In Proceedings of the Beyond Personalization 2005 Workshop Michlmayr, E, and Cayzer, S. 2007. Learning user profiles from tagging data and leveraging them for pe eanes Workshop on Tagging and Metadata for Social Information Organization, 16th International World wide Web Confe ence(www2007) Sinha, R. 2006. A social analysis of tagging. World Wide Web electronic publication
Future Work and Conclusion In this paper, we utilize a list of weight tags as profiles to recommend interesting content to users. We think that profiling a person considers not only tag weight but also semantic relationship between tags. Weighted tags can represent user’s interested topics and preference degree and tag relationship can represent the relationship between topics. Based on tag relationship, we can improve our result for social media recommendation. Recently, we launch on a small-scale user study including fifteen to twenty testers. We plan to recommend several web pages based on their tag-based user profile and collect their feedbacks and comments. This paper presented our research on tagging-based user profiling for social media recommendation. A user is pro- filed based on the tags associated with his/her social media, as well as the tags on the collection specified by his/her social contacts. We introduced the concept of tagging-based profiling with a set-theoretic definition. The tag-to-tag matrix is defined, followed by the process of making social media recommendation. This paper presented our experiments with del.icio.us bookmarks and tags of 42,643 users, filtered with selection criteria to remove outliers. We compared the results of recommendations due to both the personal and social views. References Adomavicius, G., and Tuzhilin, A. 2005. Toward the next generation of recommender systems: A survey of the stateof-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6):734–749. Firan, C.; Nejdl, W.; and Paiu, R. 2007. The benefit of using tag-based profiles. In Proceedings of the 2007 Latin American Web Conference (LA-WEB 2007), 32–41. Washington, DC, USA: IEEE Computer Society. Goldberg, D.; Nichols, D.; Oki, B. M.; and Terry, D. 1992. Using collaborative filtering to weave an information tapestry. Commun. ACM 35(12):61–70. Golder, S., and Huberman, B. A. 2006. Usage patterns of collaborative tagging systems. Journal of Information Science 32(2):198–208. Huang, Y.-C.; Hung, C.-C.; and Hsu, J. Y.-j. 2008. You are what you tag. In Proceedings of AAAI 2008 Spring Symposium Series on Social Information Processing. Liu, H., and Maes, P. 2005. InterestMap: Harvesting social network profiles for recommendations. In Proceedings of the Beyond Personalization 2005 Workshop. Michlmayr, E., and Cayzer, S. 2007. Learning user profiles from tagging data and leveraging them for personal(ized) information access. In Proceedings of the Workshop on Tagging and Metadata for Social Information Organization, 16th International World Wide Web Conference (WWW2007). Sinha, R. 2006. A social analysis of tagging. World Wide Web electronic publication