正在加载图片...
A Content-Based Method to Enhance Tag Recommendation* Yu-Ta Lu, Shoou-I Yu, Tsung-Chieh Chang, Jane Yung-jen HsI Department of Computer Science and Information Engineering National Taiwan University (b94063, b94065, r96008, yjhsu]@csientu.edu.tw Abstract Tag information is useful in many aspects. One aspect is Tagging has become a primary tool for users to that tags help describe the content in a page, revealing its se- organize and share digital content on many social mantic meaning. They not only emphasize the key terms of media sites. In addition tag information has been a page but also contain some additional information that is shown to enhance capabilities of existing search not present in the page text Bischoff et al, 20081. Another engines. However, many resources on the web facet is that tags may be useful for search. This includes per- still lack tag information. This paper proposes sonal archive administration, where people use tags to search a content-based approach to tag recommendation for documents in their collection, and possibly web search. which can be applied to webpages with or without Even though the issue that whether tags enhances web search prior tag information. While social bookmarking has been a subject of debate for some time, tags undeniably service such as Delicious! enables users to share provide good information for documents they annotate annotated bookmarks, tag recommendation is avail- Despite the advantages tags have, tags are not truly helpful able only for pages with tags specified by other in the current web, caused by the fact that most documents, or users. Our proposed approach is motivated by the webpages, contain little or no tag information. Some social observation that similar webpages tend to have the bookmarking websites such as Delicious provide tag infor- same tags. Each webpage can therefore share the mation for pages annotated by users. However, the number of tags they own with similar webpages. The propaga- pages being tagged is still too small to have a big impact on tion of a tag depends on its weight in the originat- current search engines [Heymann et aL., 2008a]. According ng webpage and the similarity between the send- to the estimation of Heymann et al., 2008al, there are about ing and receiving webpages. The similarity metric 30 to 50 million unique URLs posted publicly on Delicious between two webpages is defined as a linear com and the number of total posts is only a small portion of the bination of four cosine similarities, taking into ac web, which has at least billions of webpages. To make mat count both tag information and page content. Ex- ters worse. even if a url is bookmarked on delicious. the eriments using data crawled from delicious show URL may not have enough tag information, because the to- hat the proposed method is effective in populatin tal number of tags annotating a URL follows the power rule untagged webpages with the correct tags Figure 1 is drawn using data from a subset of our dataset 685 418 URLs crawled from Delicious. 94% of the urls 1 Introduction have less than 50 total tags, meaning that even if the url is bookmarked, there is still a high probability that it has very The phenomenal rise of social media in recent years has en- few tags. One solution to deal with the scarcity of tags on the abled an average person from being mere content readers to web is to develop an automatic tag annotating mechanism that content publishers. People share a variety of media contents helps make tag information more available. Unfortunately with their friends or the general public on social media sites. on Delicious, tag recommendation is available only for pages Tagging is commonly used on these sites to add comments with tags specified by other users. Therefore, for those web- about the media content, or to help organize and retrieve rele- pages with absolutely no tag information, a new tag annot- vant items. Tagging associates a resource with a set of words, tion method must be used which represent the semantic concepts activated by the re- This paper proposes a method for content-based tag recom- source at the cognitive level. While categorization is a pri- mendation that can be applied to webpages with or without marily subjective decision process, tagging is a social index prior tag information. The recommended tags for a webpage can be used not only as recommendations to users but also to "This research was supl automatically annotate the page. Our method first introduces ence Council NSC 97-2815-C-002-106/107-E NSC97-2622-E-002- the idea of tag/term coverage, which is an entropy-based met o10-CC2, and Ministry of Education in Taiwan 97RO1O8. ric describing how fully the tags/terms represent the anno- delicious. com tated document. Terms here refer to the words in the page 2064A Content-Based Method to Enhance Tag Recommendation∗ Yu-Ta Lu, Shoou-I Yu, Tsung-Chieh Chang, Jane Yung-jen Hsu Department of Computer Science and Information Engineering National Taiwan University {b94063, b94065, r96008, yjhsu}@csie.ntu.edu.tw Abstract Tagging has become a primary tool for users to organize and share digital content on many social media sites. In addition, tag information has been shown to enhance capabilities of existing search engines. However, many resources on the web still lack tag information. This paper proposes a content-based approach to tag recommendation which can be applied to webpages with or without prior tag information. While social bookmarking service such as Delicious1 enables users to share annotated bookmarks, tag recommendation is avail￾able only for pages with tags specified by other users. Our proposed approach is motivated by the observation that similar webpages tend to have the same tags. Each webpage can therefore share the tags they own with similar webpages. The propaga￾tion of a tag depends on its weight in the originat￾ing webpage and the similarity between the send￾ing and receiving webpages. The similarity metric between two webpages is defined as a linear com￾bination of four cosine similarities, taking into ac￾count both tag information and page content. Ex￾periments using data crawled from Delicious show that the proposed method is effective in populating untagged webpages with the correct tags. 1 Introduction The phenomenal rise of social media in recent years has en￾abled an average person from being mere content readers to content publishers. People share a variety of media contents with their friends or the general public on social media sites. Tagging is commonly used on these sites to add comments about the media content, or to help organize and retrieve rele￾vant items. Tagging associates a resource with a set of words, which represent the semantic concepts activated by the re￾source at the cognitive level. While categorization is a pri￾marily subjective decision process, tagging is a social index￾ing process. ∗This research was supported by grants from the National Sci￾ence Council NSC 97-2815-C-002-106/107-E, NSC97-2622-E-002- 010-CC2, and Ministry of Education in Taiwan 97R0108. 1 delicious.com Tag information is useful in many aspects. One aspect is that tags help describe the content in a page, revealing its se￾mantic meaning. They not only emphasize the key terms of a page but also contain some additional information that is not present in the page text [Bischoff et al., 2008]. Another facet is that tags may be useful for search. This includes per￾sonal archive administration, where people use tags to search for documents in their collection, and possibly web search. Even though the issue that whether tags enhances web search has been a subject of debate for some time, tags undeniably provide good information for documents they annotate. Despite the advantages tags have, tags are not truly helpful in the current web, caused by the fact that most documents, or webpages, contain little or no tag information. Some social bookmarking websites such as Delicious provide tag infor￾mation for pages annotated by users. However, the number of pages being tagged is still too small to have a big impact on current search engines [Heymann et al., 2008a]. According to the estimation of [Heymann et al., 2008a], there are about 30 to 50 million unique URLs posted publicly on Delicious and the number of total posts is only a small portion of the web, which has at least billions of webpages. To make mat￾ters worse, even if a URL is bookmarked on Delicious, the URL may not have enough tag information, because the to￾tal number of tags annotating a URL follows the power rule. Figure 1 is drawn using data from a subset of our dataset: 685,418 URLs crawled from Delicious. 94% of the URLs have less than 50 total tags, meaning that even if the URL is bookmarked, there is still a high probability that it has very few tags. One solution to deal with the scarcity of tags on the web is to develop an automatic tag annotating mechanism that helps make tag information more available. Unfortunately, on Delicious, tag recommendation is available only for pages with tags specified by other users. Therefore, for those web￾pages with absolutely no tag information, a new tag annota￾tion method must be used. This paper proposes a method for content-based tag recom￾mendation that can be applied to webpages with or without prior tag information. The recommended tags for a webpage can be used not only as recommendations to users but also to automatically annotate the page. Our method first introduces the idea of tag/term coverage, which is an entropy-based met￾ric describing how fully the tags/terms represent the anno￾tated document. Terms here refer to the words in the page 2064
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有