Topic-based Web Page Recommendation USing Tags Jing Peng Daniel Zeng Lab. of Complex Systems and Intelligence Science Institute of Automation, Chinese Academy of Sciences Institute of Automation, Chinese Academy of Sciences Department of Management Information Systems Beijing 100190, China The University of Arizona Jing- peng @ia.accn Tucson az 85721.USA zeng@e mail.arzo Abstrack-Collaborative tagging sites allow users to save and addition to typical bipartite information concerning users and annotate their favorite web contents with tags. These tags provide products. In collaborative tagging applications, a data record is a novel source of information for collaborative filtering. This a tuple consisting of three fields in the form of , where the tag field can be empty. Tagging data are quite embedded in tags to improve the effectiveness of Web page lifferent from rating data. A tag typically represents a semantic recommendation in a social informati estin, pproach, the probability of a Web page visit by a user is how much a user likes an item. Since there are no rating values ted by summing up the relevance of this Web page to this in tagging data, many well-known CF techniques, especially users tags, and then those pages with the highest probabilities are recommended. Experiments using two real-world the model-based ones [1, 2, 4] designed based on multi-graded collaborative tagging datasets show that our algorithms atings, are not applicable to tagging data outperform the common collaborative filtering methods In the past few years, there has been an growing number of studies aiming to advance CF research on tagging data [8, 9 Keyworks -collaborative filtering; social tag: collaborative Thes se studies are mainly concerned with the task of tag tagging; probabilistic models commendation instead of item recommendation. as ar . INTRODUCTION exception, Zeng and Li(2008) proposed two variants of the traditional user-based and item-based methods for Web page Collaborative tagging sites, such as Delicious and Flickr, recommendation [10]. They drew an analogy between the tags de-spread Websites allow users to save references to their favorite Web inter-user and inter-item similarity based on TF-IDF weighted content(Web pages, photos, etc. ) online and annotate them tag vectors. Nevertheless, the oversimplified assumption with tags for the convenience of future retrieval. However, concerning the generation of tags in this approach has limited users activity on the collaborative tagging sites depends its performance. There is an urgent need for further research heavily on their own efforts to discover interested resour that can leverage tagging information in a more refined and which has become increasingly inefficient with the explosive systematic manner with the objective of Web page growth of available Web contents. To improve the participation recommendation. In this paper, we view each tag as an stem that can help users indicator of a particular topic coverage or subject matter automatically filter out contents they may like is strongly discussed on the tagged Web pages, and propose a topic-based desired for these sites. Collaborative filtering(CF) is the most recommendation method under this interpretation elevant technique to address this problem. The rest of this paper is arranged as follows. Section 2 Collaborative filtering predicts future or unseen behavior of presents an overview of related CF work.Section 3 introduces a user by collecting historical or seen data from many other the main idea and computational steps of our topic-based users. This technique has been widely adopted in many recommendation method. Empirical studies are presented in commercial sites, including Amazon and eBay. Most of Section 4. followed by a conclusion in Section 5 existing research [ 1-5]on CF is focused on rating data where users' ratings(usually on a rating scale from 1 to 5)for items IL RELATED WORK are available. Recently, attention has been drawn to deal with the binary data from e-commerce applications [6, 7] Most CF methods fall into two categories: memory-based and model-based. In memory-based approaches, all training Collaborative tas oresents interesting opportunities as examples are stored in the memory and predications are made well as challenges as tagging data are available in for the active user by collecting information from similar users or items. The most crucial step of the memory-based algorithms is calculating similarities. Depending on the type of 三 similarity computed, memory-based methods can be further classified as user-based [5, 11] and item-based [3, 7].User- based methods glean preference information from similar users while item-based methods generate recommendations based on Natural Science Foundation of China(60875049, 70890084, and 60621001) the Chinese Academy of Sciences(2F07C0land 2F08N03), and the Ministry of Science and Technology(2006AA010106) 978-1-42444173-009/25002009IEEE 269 ISI 2009, June 8-11, 2009, Richardson, TX, USA
Topic-based Web Page Recommendation Using Tags Jing Peng Lab. of Complex Systems and Intelligence Science Institute of Automation, Chinese Academy of Sciences Beijing 100190, China jing.peng@ia.ac.cn Daniel Zeng Institute of Automation, Chinese Academy of Sciences Department of Management Information Systems The University of Arizona Tucson AZ 85721, USA zeng@email.arizona.edu Abstract—Collaborative tagging sites allow users to save and annotate their favorite web contents with tags. These tags provide a novel source of information for collaborative filtering. This paper proposes a probabilistic approach to leverage information embedded in tags to improve the effectiveness of Web page recommendation in a social information management context. In our approach, the probability of a Web page visit by a user is estimated by summing up the relevance of this Web page to this user’s tags, and then those pages with the highest probabilities are recommended. Experiments using two real-world collaborative tagging datasets show that our algorithms outperform the common collaborative filtering methods. Keyworks - collaborative filtering; social tag; collaborative tagging; probabilistic models I. INTRODUCTION Collaborative tagging sites, such as Delicious1 and Flickr2 , have been enjoying wide-spread usage in recent years. These Websites allow users to save references to their favorite Web content (Web pages, photos, etc.) online and annotate them with tags for the convenience of future retrieval. However, users’ activity on the collaborative tagging sites depends heavily on their own efforts to discover interested resources, which has become increasingly inefficient with the explosive growth of available Web contents. To improve the participation of users, a recommender system that can help users automatically filter out contents they may like is strongly desired for these sites. Collaborative filtering (CF) is the most relevant technique to address this problem. Collaborative filtering predicts future or unseen behavior of a user by collecting historical or seen data from many other users. This technique has been widely adopted in many commercial sites, including Amazon 3 and eBay 4 . Most of existing research [1-5] on CF is focused on rating data where users’ ratings (usually on a rating scale from 1 to 5) for items are available. Recently, attention has been drawn to deal with the binary data from e-commerce applications [6, 7]. Collaborative tagging presents interesting opportunities as well as challenges to CF as tagging data are available in —————— 1 http://www.delicious.com 2 http://www.flickr.com 3 http://www.amazon.com 4 http://www.ebay.com addition to typical bipartite information concerning users and products. In collaborative tagging applications, a data record is a tuple consisting of three fields in the form of , where the tag field can be empty. Tagging data are quite different from rating data. A tag typically represents a semantic description of an information item, whereas a rating indicates how much a user likes an item. Since there are no rating values in tagging data, many well-known CF techniques, especially the model-based ones [1, 2, 4] designed based on multi-graded ratings, are not applicable to tagging data. In the past few years, there has been an growing number of studies aiming to advance CF research on tagging data [8, 9]. These studies are mainly concerned with the task of tag recommendation instead of item recommendation. As an exception, Zeng and Li (2008) proposed two variants of the traditional user-based and item-based methods for Web page recommendation [10]. They drew an analogy between the tags for items and the keywords for documents, and then calculated inter-user and inter-item similarity based on TF-IDF weighted tag vectors. Nevertheless, the oversimplified assumption concerning the generation of tags in this approach has limited its performance. There is an urgent need for further research that can leverage tagging information in a more refined and systematic manner with the objective of Web page recommendation. In this paper, we view each tag as an indicator of a particular topic coverage or subject matter discussed on the tagged Web pages, and propose a topic-based recommendation method under this interpretation. The rest of this paper is arranged as follows. Section 2 presents an overview of related CF work. Section 3 introduces the main idea and computational steps of our topic-based recommendation method. Empirical studies are presented in Section 4, followed by a conclusion in Section 5. II. RELATED WORK Most CF methods fall into two categories: memory-based and model-based. In memory-based approaches, all training examples are stored in the memory and predications are made for the active user by collecting information from similar users or items. The most crucial step of the memory-based algorithms is calculating similarities. Depending on the type of similarity computed, memory-based methods can be further classified as user-based [5, 11] and item-based [3, 7]. Userbased methods glean preference information from similar users while item-based methods generate recommendations based on Research reported in this article has been supported by the National Natural Science Foundation of China (60875049, 70890084, and 60621001), the Chinese Academy of Sciences (2F07C01and 2F08N03), and the Ministry of Science and Technology (2006AA010106). 978-1-4244-4173-0/09/$25.00 ©2009 IEEE 269 ISI 2009, June 8-11, 2009, Richardson, TX, USA
similar items. Two commonly used algorithms for similarity this item to all tags. Given that a user usually has different calculation are the Pearson Correlation Coefficient algorithm [5 levels of interest on different topics(tags), a weighted sum is and the Vector Space Similarity algorithm [ll]. M ent used, where the weight for a tag is its probability of occurrence based approaches have grown in popularity because of their within this users tagging activities. This idea can be simplicity, but they suffer from low accuracy and difficulty to formalized as below scale In contrast to the memory-based algorithms, all the training data are used to train a predefined model when taking a model- based approach, Predications are made thereafter based on this Herein, P(chu is the probability of user u annotating models were proposed: clustering and an Bayesian network items with tagt and p(it) is the conditional probability of model[11]. One limitation of this work is that every user is saving item t when tag is given. Two assumptions are forced into a single class while intuitively users tend to have made implicitly in the above formula: 1) Tags are independent diverse interests. More recent algorithms are designed to from each other. ie. each tag stands for a distinct topic. 2 multiple clusters. Examples include the latent semantic model p(, e=p@), this is equivalent to stating that users [1], the personality diagnosis model [4], and the flexible and items are conditional independent on tags mixture model [2]. The model-based approaches generally 3.3 Computation process achieve better results than the memory-based ones and scale very well, whereas their models are often time-consuming to Two types of conditional probabilities need to be estimated build and update in our approach, namely P(lu) and P(lt). Actually, this is There are other ce techniques that cannot be easily quite straightforward when the usage matrix and the tagging categorized as either memory-based or model-based. For matrix are generated from the training data. In the usage matrix, example, to alleviate the problem of data sparsity, Sarwar et al. each user is represented by a row of tags weighted with their (2000) proposed a dimensionality reduction method taking frequencies. The conditional probability of every tag on a advantage of singular value decomposition [ 12] pecific user can be obtained by normalizing the row of this user to unit sum. Similarly, p(a le)can be estimated by IIL TOPIC-BASED RECOMMENDATION normalizing each column of the tagging matrix to unit sum In this section, we present our new approach of topic-based Further, equation (1) can be rewritten as equation(2)in the recommend form of matrix multiplication 3/ Notation Let 1=tlu? e a set of Where& represents the resulting matrix and Ry holds G,is t=CLe2-, t be a set of tags. Three different types of the computed probability of user ti saving item y.T matrix can be derived from the tagging database, that is, the from equation(2) that the computational complexity bookmarking matrix D(a user-item matrix), the usage matrix algorithm is O(mn/). Given that 1, the number of selecte t(a user-tag matrix), and the tagging matrix I(an item-tag s usually much smaller than m and n, our algorithm is than traditional memory-based methods matrix). In the bookmarking matrix, 4 is 1 if user - IV. EMPIRICAL ANALYSIS bookmarked item I, otherwise 0. The usage matrix and 4. Dataset tagging matrix, UR andn, correspond to the frequency of The tested datasets were crawled down from delicious. a tag i on user t and item respectively major social bookmarking site, which allows users to post their favorite URLs and share them with their friends Two raw 3.2 Topic-based method datasets consisting of 5,000 users each were collected for our As has been discussed in [13], tags are typically used to experimental study. It turns out that both datase re extreme entify the topics of bookmarked items In our approach, we large and sparse. To reduce the size of the raw datasets, we treat each tag as an indicator of a topic rather than a keyword as filtered out users that have posted more than 20 URLs and in [10]. It is intuitive to assume that a user only favors items URLs that have been saved by more than 15 users. In addition discussing the topics of his or her interest. We can extend this only the tags that occurred more than 20 times in the training intuition to hypothesize that users only prefer items closely data are selected for consideration in our experiment. Finally, related to their favorite tags. In a probabilistic sense, the two smaller and denser datasets named as Small and Large relevance of an item relative to a tag can be measured by the were obtained. The characteristics of these datasets are conditional probability of bookmarking or saving this item summarizied in Table I when this tag is used. Consequently, the probability of a user saving an item can be obtained by summing up the relevance of
similar items. Two commonly used algorithms for similarity calculation are the Pearson Correlation Coefficient algorithm [5] and the Vector Space Similarity algorithm [11]. Memorybased approaches have grown in popularity because of their simplicity, but they suffer from low accuracy and difficulty to scale. In contrast to the memory-based algorithms, all the training data are used to train a predefined model when taking a modelbased approach. Predications are made thereafter based on this trained model. In an early work, two alternative probabilistic models were proposed: clustering and an Bayesian network model [11]. One limitation of this work is that every user is forced into a single class while intuitively users tend to have diverse interests. More recent algorithms are designed to capture multiple interests of users by classifying them into multiple clusters. Examples include the latent semantic model [1], the personality diagnosis model [4], and the flexible mixture model [2]. The model-based approaches generally achieve better results than the memory-based ones and scale very well, whereas their models are often time-consuming to build and update. There are other CF techniques that cannot be easily categorized as either memory-based or model-based. For example, to alleviate the problem of data sparsity, Sarwar et al. (2000) proposed a dimensionality reduction method taking advantage of singular value decomposition [12]. III. TOPIC-BASED RECOMMENDATION In this section, we present our new approach of topic-based recommendation. 3.1 Notation Let be a set of users, be a set of items, and be a set of tags. Three different types of matrix can be derived from the tagging database, that is, the bookmarking matrix (a user-item matrix), the usage matrix (a user-tag matrix), and the tagging matrix (an item-tag matrix). In the bookmarking matrix, is 1 if user bookmarked item , otherwise 0. The usage matrix and tagging matrix, and , correspond to the frequency of tag on user and item respectively. 3.2 Topic-based method As has been discussed in [13], tags are typically used to identify the topics of bookmarked items. In our approach, we treat each tag as an indicator of a topic rather than a keyword as in [10]. It is intuitive to assume that a user only favors items discussing the topics of his or her interest. We can extend this intuition to hypothesize that users only prefer items closely related to their favorite tags. In a probabilistic sense, the relevance of an item relative to a tag can be measured by the conditional probability of bookmarking or saving this item when this tag is used. Consequently, the probability of a user saving an item can be obtained by summing up the relevance of this item to all tags. Given that a user usually has different levels of interest on different topics (tags), a weighted sum is used, where the weight for a tag is its probability of occurrence within this user’s tagging activities. This idea can be formalized as below: (1) Herein, is the probability of user annotating items with tag and is the conditional probability of saving item when tag is given. Two assumptions are made implicitly in the above formula: 1) Tags are independent from each other, i.e., each tag stands for a distinct topic. 2) , this is equivalent to stating that users and items are conditional independent on tags. 3.3 Computation process Two types of conditional probabilities need to be estimated in our approach, namely and . Actually, this is quite straightforward when the usage matrix and the tagging matrix are generated from the training data. In the usage matrix, each user is represented by a row of tags weighted with their frequencies. The conditional probability of every tag on a specific user can be obtained by normalizing the row of this user to unit sum. Similarly, can be estimated by normalizing each column of the tagging matrix to unit sum. Further, equation (1) can be rewritten as equation (2) in the form of matrix multiplication. (2) Where represents the resulting matrix and holds the computed probability of user saving item . indicates the transposition of matrix . It can be seen clearly from equation (2) that the computational complexity of our algorithm is O(mnl). Given that l, the number of selected tags, is usually much smaller than m and n, our algorithm is faster than traditional memory-based methods. IV. EMPIRICAL ANALYSIS 4.1 Dataset The tested datasets were crawled down from delicious, a major social bookmarking site, which allows users to post their favorite URLs and share them with their friends. Two raw datasets consisting of 5,000 users each were collected for our experimental study. It turns out that both datasets are extremely large and sparse. To reduce the size of the raw datasets, we filtered out users that have posted more than 20 URLs and URLs that have been saved by more than 15 users. In addition, only the tags that occurred more than 20 times in the training data are selected for consideration in our experiment. Finally, two smaller and denser datasets named as Small and Large were obtained. The characteristics of these datasets are summarizied in Table I. 270
lee. We randomly divided the datasets into a training set and a we also observe that tags are better descriptor for items than for set. The split is performed in a ratio of 0.8-0.2(train to test) users, in that calculating similarities based on tags improves the and done for each user as in [14. In the predication phase, we performance of item-based method while decreases the recommend 5 items for each user and then compare them with accuracy of user-based method the test set. The evaluation metrics adopted in our experiments are the commonly used precision, recall, F-measure, and The user-based method performs significantly better than rankscore for ranked list prediction. the item-based algorithm. This may be due to the fact that the rows in the bookmark matrix are much denser than the column in our datasets. It's rather surprising that the simple POP algorithm outperforms the item-based algorithm on both TABLE I DATASET CHARACTERISTICS Dataset Small Number of users I Number of urls Number of tags 9928 in This paper proposes a probabilistic approach to leverage formation embedded in tags to improve the effectivenes Number of transactions 29615 41400 Web page recommendation in a social information management context. Computational experiments on two Average number of URLs per user 43 48.08 collaborative tagging datasets show that our algorithms Average number of users per URL2303 25.35 outperform the common collaborative filtering methods. For Note: One transaction indicates a user tagging a URL irespective of how many tags were provided future work, we plan to thoroughly investigate the underlying assumptions of the proposed algorithm and seek for more 4.2 Experimental results complex and systematic techniques to fully exploit the red in information hidden in tags Seven different algorithms were our recommends REFERENCES the most popular items, provides a benchmark for all the other [1]T Hofmann, Latent class models for collaborative filtering, ACM Trans algorithms. The user-based (UB) and item-based (IB) inf.Syst.,22(1)89-115,2004. lgorithms, as well as their variants [10] namely tagging user- [2] L. Si and R Jin, Flexible mixture model for collaborative filtering, In based (tub) and tagging item-based (tiB), are also Proc. of ICML, 2003 implemented as the baselines. In addition to the proposed 3] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item-based topic-based (TB)method, the Svd dimensionality reduction collaborative filtering recommendation algorithms. In Proc. of www (SVD) method is included for comparison. We have run all the 2001 experimental setups for 5 times, and then averaged the results [4]DM.Pennock,E.Horvitz,S.Lawrence,and CL.Giles,Collaborative over these runs. Table 2 summarizes the final comparisor approach, In Proc. of UAL, 2000 results on both the Small and Large datasets [5]P. Resnick, N. lacovo, M. Suchak, P. Bergstrom, and J. Riedl, GroupLens: an open architecture for collaborative filtering of netnews, TABLE IL EXPERIMENT RESULTS In Proc of ACM Conference on Computer Supported Cooperative Work Dataset Algorithm Precision Recall F-measure Rank Score [6] Z. Huang, D. Zeng, and C. Hsinchun, A comparison of collaborative- g recommendation algorithms for e-commerce, Intelligent 4.30 Systems,IEEE,22(5):68-78,2007 TUB 1.17 3.90 17 G. Linden, B. Smith, and J. York, 3.00 to-item collaborative filtering. IE tering. IEEE Internet Computing, 7(1): 76-80 [8]J. Robert, M. Leandro, H. Andreas, S. Lars, S. Gerd, Tag SVD 4231.27 4.27 recommendations in social bookmarking systems. Al Communications, TB 149220 503 70 [9] G. Mishne, Auto Tag: a collaborative approach to automated tag signment for weblog posts, In Proc. of www, 2006 TUB 190 4.77[0]D. Zeng and H. Li, How useful are tags? -an empirical analysis of eb page recommendation, in Proceedings of Large 3.07 the IEEE ISI 2008 PAISl, PACCF, and SoCO international workshops .61 on Intelligence and Security Informatics, 2008 [11]J. S. Breese, D. Hecke and C. Kadie, Empirical analysis of 1.57 predictive algorithms for collaborative filtering, In Proc. of UAl, 1998. [12] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Application ote:all values for Precision, Recall and F-measure are showed in percentag dimensionality reduction in recommender system-a case study, ACM Conference on E-Commerce. 2000 user-based method. The poor performance of tagging user- [14]A.S. Das, M. Datar, A. Garg, and S.Rajaram, based and item-based methods shows that it is more reasonable onalization: scalable online collaborative filtering to treat each tag as a topic rather than a keyword. Meanwhile, WwW.2007 71
We randomly divided the datasets into a training set and a test set. The split is performed in a ratio of 0.8-0.2 (train to test) and done for each user as in [14]. In the predication phase, we recommend 5 items for each user and then compare them with the test set. The evaluation metrics adopted in our experiments are the commonly used precision, recall, F-measure, and rankscore for ranked list prediction. TABLE I. DATASET CHARACTERISTICS Dataset Small Large Number of users 683 861 Number of URLs 1286 1633 Number of tags 8492 9928 Number of transactions 29615 41400 Density level (%) 3.37 2.94 Average number of URLs per user 43.36 48.08 Average number of users per URL 23.03 25.35 Note: One transaction indicates a user tagging a URL, irrespective of how many tags were provided. 4.2 Experimental results Seven different algorithms were compared in our experiment. The POP algorithm, which always recommends the most popular items, provides a benchmark for all the other algorithms. The user-based (UB) and item-based (IB) algorithms, as well as their variants [10] namely tagging userbased (TUB) and tagging item-based (TIB), are also implemented as the baselines. In addition to the proposed topic-based (TB) method, the SVD dimensionality reduction (SVD) method is included for comparison. We have run all the experimental setups for 5 times, and then averaged the results over these runs. Table 2 summarizes the final comparison results on both the Small and Large datasets. TABLE II. EXPERIMENT RESULTS Dataset Algorithm Precision Recall F-measure Rank Score POP 3.25 1.01 1.48 3.30 UB 4.25 1.30 1.91 4.30 TUB 3.84 1.17 1.72 3.90 IB 3.00 0.90 1.34 3.05 TIB 4.17 1.29 1.89 4.19 SVD 4.23 1.27 1.88 4.27 Small TB 4.94 1.49 2.20 5.03 POP 3.61 0.97 1.47 3.70 UB 5.30 1.43 2.16 5.38 TUB 4.70 1.25 1.90 4.77 IB 3.05 0.81 1.24 3.07 TIB 3.95 1.05 1.61 3.98 SVD 5.41 1.46 2.21 5.51 Large TB 5.84 1.57 2.39 5.93 Note: all values for Precision, Recall and F-measure are showed in percentage As shown in Table II, the proposed topic-based algorithm achieves the best results, followed by the SVD method and the user-based method. The poor performance of tagging userbased and item-based methods shows that it is more reasonable to treat each tag as a topic rather than a keyword. Meanwhile, we also observe that tags are better descriptor for items than for users, in that calculating similarities based on tags improves the performance of item-based method while decreases the accuracy of user-based method. The user-based method performs significantly better than the item-based algorithm. This may be due to the fact that the rows in the bookmark matrix are much denser than the columns in our datasets. It’s rather surprising that the simple POP algorithm outperforms the item-based algorithm on both datasets. V. CONCLUSION This paper proposes a probabilistic approach to leverage information embedded in tags to improve the effectiveness of Web page recommendation in a social information management context. Computational experiments on two collaborative tagging datasets show that our algorithms outperform the common collaborative filtering methods. For future work, we plan to thoroughly investigate the underlying assumptions of the proposed algorithm and seek for more complex and systematic techniques to fully exploit the information hidden in tags. REFERENCES [1] T. Hofmann, Latent class models for collaborative filtering, ACM Trans. Inf. Syst., 22(1):89–115, 2004. [2] L. Si and R. Jin, Flexible mixture model for collaborative filtering, In Proc. of ICML, 2003. [3] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item-based collaborative filtering recommendation algorithms, In Proc. of WWW, 2001. [4] D. M. Pennock, E. Horvitz, S. Lawrence, and C. L. Giles, Collaborative filtering by personality diagnosis: a hybrid memory and model-based approach, In Proc. of UAI, 2000. [5] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl, GroupLens: an open architecture for collaborative filtering of netnews, In Proc. of ACM Conference on Computer Supported Cooperative Work, 1994. [6] Z. Huang, D. Zeng, and C. Hsinchun, A comparison of collaborativefiltering recommendation algorithms for e-commerce, Intelligent Systems, IEEE, 22(5): 68-78, 2007. [7] G. Linden, B. Smith, and J. York, Amazon.com recommendations: itemto-item collaborative filtering. IEEE Internet Computing, 7(1):76-80, 2003. [8] J. Robert, M. Leandro, H. Andreas, S. Lars, S. Gerd, Tag recommendations in social bookmarking systems. AI Communications, 2008. 21(4): p. 231-247. [9] G. Mishne, AutoTag: a collaborative approach to automated tag assignment for weblog posts, In Proc. of WWW, 2006. [10] D. Zeng and H. Li, How useful are tags? — an empirical analysis of collaborative tagging for web page recommendation, in Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics, 2008. [11] J. S. Breese, D. Heckerman, and C. Kadie, Empirical analysis of predictive algorithms for collaborative filtering, In Proc. of UAI, 1998. [12] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Application of dimensionality reduction in recommender system — a case study, ACM Conference on E-Commerce, 2000. [13] S. A. Golder and B.A. Huberman, Usage patterns of collaborative tagging systems. J. Inf. Sci., 32(2): 198-208, 2006. [14] A. S. Das, M. Datar, A. Garg, and S. Rajaram, Google news personalization: scalable online collaborative filtering, In Proc. of WWW, 2007. 271