ARTICLE N PRESS xpert Systems with Applications xxx(2011)xXX-XXX Contents lists available at SciVerse Science Direct Expert Systems with Applications ELSEVIER journalhomepagewww.elsevier.com/locate/eswa Hybrid personalized recommender system using centering-bunching based clustering algorithm Subhash K. Shinde, Uday Kulkarni ollege of Engineering, Navi Mumbai 400 614, India Department of Computer Science and E ng, SGGS Institute of Engineering and Technology, Nanded, India ARTICLE INFO ABSTRACT n recent years, there is overload of products information on world wide web. a personalized recommen- elaborative filtering dation is an enabling mechanism to overcome information overload occurred when shopping in an Inte Web personalized recommender system net marketplace. This paper proposes a novel centering-bunching based clustering(CBBC) algorithm hich is used for hybrid personalized recommender system(CBBCHPRS). The proposed system works n two phases. In the first phase, opinions from the users are collected in the form of user-item rating matrix. They are clustered offline using CBBC into predetermined number clusters and stored in a data base for future recommendation. In the second phase, the recommendations are generated online for active user using similarity measures by choosing the clusters with good quality rating. This helps get further effectiveness and quality of recommendations for the active users. The experimental results sing Iris dataset show that the proposed CBBC performs better than K-means and new K-medodis algo thms. The performance of CBBCHPRS is evaluated using Jester database available on website of Califor- d compared with ants recommender system(ARS). The results obtained at the proposed CBBCHPRS performs superiorly and alleviates problems such as cold-start. fir sparsity. 2011 Elsevier Ltd. All rights reserved. 1 Introduction are applied in many areas such as: web-browsing, information fill- tering, net-news or movie recommender and e-commerce. The cen- Many e-commerce Web sites offer numerous services, so a prod- tral element of all recommender systems is the user model that uct search could return an overwhelming set of options. Without contains knowledge about the individual preferences which deter- system support, filtering irrelevant products, comparing alterna- mine his or her behavior in a complex environment of web-based tives, and selecting the best option can be difficult or impossible. system. The WPRs are characterized by cross-fertilization of address this problem and suggest products(movies, music, jokes, ligence, knowledge representation, discovery and data web mining. books, news, web pages) that suit the user's needs(Adomavicius computational learning and intelligent and adaptive agents. the Tuzhilin, 2005 ) These systems add related information of items alternating information environment that is combined of variou to the information flowing towards the user, as opposed to remov- users, their needs and contexts of use as well as different system g information items. typically a recommender system compares platforms necessitates application of recommender systems. the the users profile to some reference characteristics, and seeks to pre- ever increasing importance of the e-commerce in the global econ- dict the rating that a user would give to an item they had not yet omy also increases the importance of the WPRS. They are developed considered Recommender systems use collaborative filtering ap- by different domains such as personal agents and adaptive hyper proaches or a combination of the collaborative and content based media. the personalized hypermedia application is defined as a filtering approaches, although content-based recommender sys- hypermedia system that adapts: the content, structure and or pre- tems do exist(Koren, Bell, Chris, 2009). The web personalized re sentation of the web objects to each individual user's modeL emendation systems(WPRS)are recently applied to provide The remainder of this paper is organized as follows. The section different type of customized information for their users. The wPrs 2 describes various types of input data that are used for the recom- mendation systems. The section 3 summarizes the different tech- niques of recommendation systems and their drawbacks. The ding author..Tel+9109221788066;fax:+9102227473196 proposed clustering based hybrid personalized recommender E-mail addresses: skshindeerediffmail com(S.K. Shinde), kulkuniuveyahoo. system is described in the section 4. The section 5 illustrates exper- imental setup of the proposed recommendation system. This 0957-4174/s- see front matter o 2011 Elsevier Ltd. All rights reserved doi:10.1016/eswa2011.08020 Please cite this article in press as: Shinde, S.K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering algo- rithm. Expert Systems with Applications(2011). doi: 10.1016/jeswa2011.08.020
Hybrid personalized recommender system using centering-bunching based clustering algorithm Subhash K. Shinde ⇑ , Uday Kulkarni Department of Computer Engineering, Bharati Vidyapeeth College of Engineering, Navi Mumbai 400 614, India Department of Computer Science and Engineering, SGGS Institute of Engineering and Technology, Nanded, India article info Keywords: Collaborative filtering Centering-bunching based clustering Web personalized recommender system abstract In recent years, there is overload of products information on world wide web. A personalized recommendation is an enabling mechanism to overcome information overload occurred when shopping in an Internet marketplace. This paper proposes a novel centering-bunching based clustering (CBBC) algorithm which is used for hybrid personalized recommender system (CBBCHPRS). The proposed system works in two phases. In the first phase, opinions from the users are collected in the form of user-item rating matrix. They are clustered offline using CBBC into predetermined number clusters and stored in a database for future recommendation. In the second phase, the recommendations are generated online for active user using similarity measures by choosing the clusters with good quality rating. This helps to get further effectiveness and quality of recommendations for the active users. The experimental results using Iris dataset show that the proposed CBBC performs better than K-means and new K-medodis algorithms. The performance of CBBCHPRS is evaluated using Jester database available on website of California University, Berkeley and compared with ants recommender system (ARS). The results obtained empirically demonstrate that the proposed CBBCHPRS performs superiorly and alleviates problems such as cold-start, first-rater and sparsity. 2011 Elsevier Ltd. All rights reserved. 1. Introduction Many e-commerce Web sites offer numerous services, so a product search could return an overwhelming set of options. Without system support, filtering irrelevant products, comparing alternatives, and selecting the best option can be difficult or impossible. The recommender systems are personalized applications that can address this problem and suggest products (movies, music, jokes, books, news, web pages) that suit the user’s needs (Adomavicius & Tuzhilin, 2005). These systems add related information of items to the information flowing towards the user, as opposed to removing information items. Typically, a recommender system compares the user’s profile to some reference characteristics, and seeks to predict the rating that a user would give to an item they had not yet considered. Recommender systems use collaborative filtering approaches or a combination of the collaborative and content based filtering approaches, although content-based recommender systems do exist (Koren, Bell, & Chris, 2009). The web personalized recommendation systems (WPRS) are recently applied to provide different type of customized information for their users. The WPRS are applied in many areas such as: web-browsing, information filtering, net-news or movie recommender and e-commerce. The central element of all recommender systems is the user model that contains knowledge about the individual preferences which determine his or her behavior in a complex environment of web-based system. The WPRS are characterized by cross-fertilization of various research fields such as: information retrieval, artificial intelligence, knowledge representation, discovery and data/web mining, computational learning and intelligent and adaptive agents. The alternating information environment that is combined of various users, their needs and contexts of use as well as different system platforms necessitates application of recommender systems. The ever increasing importance of the e-commerce in the global economy also increases the importance of the WPRS. They are developed by different domains such as personal agents and adaptive hypermedia. The personalized hypermedia application is defined as a hypermedia system that adapts: the content, structure, and/or presentation of the web objects to each individual user’s model. The remainder of this paper is organized as follows. The section 2 describes various types of input data that are used for the recommendation systems. The section 3 summarizes the different techniques of recommendation systems and their drawbacks. The proposed clustering based hybrid personalized recommender system is described in the section 4. The section 5 illustrates experimental setup of the proposed recommendation system. This 0957-4174/$ - see front matter 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.08.020 ⇑ Corresponding author. Tel.: +91 09221788066; fax: +91 022 27473196. E-mail addresses: skshinde@rediffmail.com (S.K. Shinde), kulkurniuv@yahoo. com (U. Kulkarni). Expert Systems with Applications xxx (2011) xxx–xxx Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
ARTICLE N PRESS S.K. Shinde, U. Kulkarmi/ Expert Systems with Applications xxx(2011 )Xxx-xXxx sect gives performance evaluation with the existing algo- Kulkarni, 2008: Shinde& Kulkarni, XXXX; Yu, Schwaighofer, Tresp rithms. Finally, the section 6 concludes the paper. Xu, Kriegel, 2004)combine content and collaborative based filter- ng to overcome these limitations. As stated below, there are differ- 2. Input to the recommendation systems ent ways of combining content and collaborative based filtering ( Cheung Tsui, 2004). The recommendation systems are being widely used in many applications such as Amazon. com, Net-flix com etc; to suggest L. Implementing these approaches separately and combining roducts, services, and information of items to potential consum- them for prediction. ers. At the heart of recommendation technologies are the algo- Incorporating some content based characteristics into col- ithms for making recommendations based on various types of laborative approach and vice versa input data. In e-commerce, most recommendation algorithms take iii. Constructing a general unified model that incorporates both the following three types of data as an input: product attributes content and collaborative based characteristics consumer attributes, and previous interactions between consum- rs and products (e.g, buying, rating, and catalog browsing). The The hybrid approach proposed in this paper extracts user's cur input data types are summarized in the table 1 rent browsing patterns using web usage mining, and forms a clu ter of items with similar psychology to obtain implicit users rating 3. Personalized recommendation techniques for the recommended item In the recent years web personalization has undergone through 4 Proposed CBBCHPRS tremendous changes. The content (Allen, 1990: Kalles, Papagelis Zaliagis, 2003), collaborative( Hofmann, 2003)and hybrid Alaba- ve arvelo ed and tested the CBBCHPrS for Jester dataset novic Sholam, 1997)based filtering are three basic approaches available on website of California University, Berkeley. The system used to design recommendation systems. architecture has been partitioned into two main phases: offline and The content based filtering(Chun Zeng et al. 2002)relies on the online. The Fig. 1 depicts the architecture of CBBCHPRS with its content of an item that user has experienced before The content essential components ased information filtering has proven to be effective in locating The phase I is offline. It does the preprocessing and clusterin text, items that are relevant to the topic using techniques such as In this phase background data in the form of user-item rating ma- Boolean queries, vector space queries etc. However, content based trix is collected and clustered using the proposed approach which filtering has some limitations. It is difficult to provide appropriate is described in section 4.1.2. Once the clusters are obtained the recommendation because all the information is selected and rec- cluster data along with their centroids are stored for future ommended based on the content moreover the content based recommendations filtering leads to overspecialization i.e. it recommends all the re- The phase ll is online in which the recommendation takes place ated items instead of the particular item liked by the user. The col- for the active user. Here, similarity and density of clusters are cal- borative-filtering(Ulrike Daniel, 2006)aims to identify users culated for choosing best clusters for making recommendations who have relevant interests and preferences by calculating similar- The rating quality of each item unrated by active user is computed ities and dissimilarities between their profiles. The idea behind this in the chosen clusters. To generate the recommendations, clusters ing the behavior of other users who shares similar interests and mendations are then made by computing the weighted average of hose opinions can be trusted may be beneficial. The different the rating of items in the selected clusters chniques have been proposed for collaborative recommendation: The working of CBBCHPRS is described below in detail with the such as correlation based method, semantic indexing etc. The col- Jester dataset laborative filtering overcomes some of the limitations of the con- tent based filtering. The system can suggest items to the user, 4. 1. Preprocessing phase based on the rating of items, instead of the content of the items which can improve the quality of recommendations. However, col- 4.1.1. Normalization of data laborative filtering has some drawbacks. The first drawback is that User-item rating taken from Jester dataset rated in the scale of the coverage of rating could be very sparse thereby resulting in poor -10 to +10 is normalized in the scale of 0 to 1, where 0 indicates quality recommendation. In the case of the addition of new items that item is not rated by corresponding user To facilitate the dis- into database, the system would not be able to recommend until cussion, running example shown in the Table 2 is used, where that item is served to a substantial number of users known as U1-U10 are the users and Ji1o are the items Jokes)rated or un- cold-start. Secondly, when new users are added, the system must rated by users. The last row of Table 2 gives ratings of the active learn the user preferences from the rating of users, in order to make user. curate recommendations moreover. these recommendation algorithms seem to be very extensive and grow non-linearly when 4.1.2. Centering-bunching based clustering the number of users and items in a database increase. The hybrid In the K-means, and new K-medodis(Hae-Sang Chi-Hyuck recommendation systems(Adomavicius Tuzhilin, 2005: Shinde Jun, 2009)clustering algorithm centroids are initially selected by Table 1 Taxonomy of input data. ating scores such as discrete multilevel and continuous ratings: and based on latest comments such as best, good, bad, worse and so on. 3 Behavior pattern dura of browsing, click times, the links of webs; save, print, scroll, delete, open, close, refres sh of webs: selection, edition, search, copy. 4 Transaction data purchasing date, purchase quantity, price, discounting and so on 5 Production data for movies, jokes or music, actor or singer, topic, release time, price, brand and Please cite this article in press as: Shinde, S K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering alg rithm. Expert Systems with Applications(2011). doi: 10. 1016/j eswa. 2011.08.020
section also gives performance evaluation with the existing algorithms. Finally, the section 6 concludes the paper. 2. Input to the recommendation systems The recommendation systems are being widely used in many applications such as Amazon.com, Net-flix.com etc.; to suggest products, services, and information of items to potential consumers. At the heart of recommendation technologies are the algorithms for making recommendations based on various types of input data. In e-commerce, most recommendation algorithms take the following three types of data as an input: product attributes, consumer attributes, and previous interactions between consumers and products (e.g., buying, rating, and catalog browsing). The input data types are summarized in the Table 1. 3. Personalized recommendation techniques In the recent years web personalization has undergone through tremendous changes. The content (Allen, 1990; Kalles, Papagelis, & Zaliagis, 2003), collaborative (Hofmann, 2003) and hybrid (Balabanovic & Sholam, 1997) based filtering are three basic approaches used to design recommendation systems. The content based filtering (Chun Zeng et al., 2002) relies on the content of an item that user has experienced before. The content based information filtering has proven to be effective in locating text, items that are relevant to the topic using techniques such as Boolean queries, vector space queries etc. However, content based filtering has some limitations. It is difficult to provide appropriate recommendation because all the information is selected and recommended based on the content. Moreover, the content based filtering leads to overspecialization i.e. it recommends all the related items instead of the particular item liked by the user. The collaborative-filtering (Ulrike & Daniel, 2006) aims to identify users who have relevant interests and preferences by calculating similarities and dissimilarities between their profiles. The idea behind this method is that to one’s search the information collected by consulting the behavior of other users who shares similar interests and whose opinions can be trusted may be beneficial. The different techniques have been proposed for collaborative recommendation; such as correlation based method, semantic indexing etc. The collaborative filtering overcomes some of the limitations of the content based filtering. The system can suggest items to the user, based on the rating of items, instead of the content of the items which can improve the quality of recommendations. However, collaborative filtering has some drawbacks. The first drawback is that the coverage of rating could be very sparse thereby resulting in poor quality recommendation. In the case of the addition of new items into database, the system would not be able to recommend until that item is served to a substantial number of users known as cold-start. Secondly, when new users are added, the system must learn the user preferences from the rating of users, in order to make accurate recommendations. Moreover, these recommendation algorithms seem to be very extensive and grow non-linearly when the number of users and items in a database increase. The hybrid recommendation systems (Adomavicius & Tuzhilin, 2005; Shinde & Kulkarni, 2008; Shinde & Kulkarni, xxxx; Yu, Schwaighofer, Tresp, Xu, & Kriegel, 2004) combine content and collaborative based filtering to overcome these limitations. As stated below, there are different ways of combining content and collaborative based filtering (Cheung & Tsui, 2004). i. Implementing these approaches separately and combining them for prediction. ii. Incorporating some content based characteristics into collaborative approach and vice versa. iii. Constructing a general unified model that incorporates both content and collaborative based characteristics. The hybrid approach proposed in this paper extracts user’s current browsing patterns using web usage mining, and forms a cluster of items with similar psychology to obtain implicit users rating for the recommended item. 4. Proposed CBBCHPRS We have developed and tested the CBBCHPRS for Jester dataset available on website of California University, Berkeley. The system architecture has been partitioned into two main phases; offline and online. The Fig. 1 depicts the architecture of CBBCHPRS with its essential components. The phase I is offline. It does the preprocessing and clustering. In this phase background data in the form of user-item rating matrix is collected and clustered using the proposed approach which is described in section 4.1.2. Once the clusters are obtained the cluster data along with their centroids are stored for future recommendations. The phase II is online in which the recommendation takes place for the active user. Here, similarity and density of clusters are calculated for choosing best clusters for making recommendations. The rating quality of each item unrated by active user is computed in the chosen clusters. To generate the recommendations, clusters are further selected based on rating quality of an item. The recommendations are then made by computing the weighted average of the rating of items in the selected clusters. The working of CBBCHPRS is described below in detail with the Jester dataset. 4.1. Preprocessing phase 4.1.1. Normalization of data User-item rating taken from Jester dataset rated in the scale of 10 to +10 is normalized in the scale of 0 to 1, where 0 indicates that item is not rated by corresponding user. To facilitate the discussion, running example shown in the Table 2 is used, where U1-U10 are the users and J1-J10 are the items (jokes) rated or unrated by users. The last row of Table 2 gives ratings of the active user. 4.1.2. Centering-bunching based clustering In the K-means, and new K-medodis (Hae-Sang & Chi-Hyuck Jun, 2009) clustering algorithm centroids are initially selected by Table 1 Taxonomy of input data. 1 Demographic data name, age, gender, profession, birth date, telephone, address, hobbies, salary, education, experience and so on 2 Rating data rating scores such as discrete multilevel and continuous ratings; and based on latest comments such as best, good, bad, worse and so on. 3 Behavior pattern data duration of browsing, click times, the links of webs; save, print, scroll, delete, open, close, refresh of webs; selection, edition, search, copy, paste, and so on. 4 Transaction data purchasing date, purchase quantity, price, discounting and so on 5 Production data for movies, jokes or music, actor or singer, topic, release time, price, brand and so on. 2 S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
ARTICLE N PRESS S K Shinde, U Kulkarni/Expert Systems with Applications xxx(2011)xxx-xox Preprocessing Phase User-Item tating Matrix Normalization Rating Matrix formalized ng Matri Based Clustering Recommendation phasc Navigation Query Hybrid Filtering Content+Collaborative Filtering) y Recommended St Fig. 1. System architecture of CBBCHPRS Running example of rating matrix from jester data set after normalization in the range of o to 0.13 27 0.35 cluster boundaries are governed by the value of bunchi this manual selection of centroids. Whereas, the proposed cluster- factor ing algorithm initially calculates centroids appropriately, this re- (iii) Removal of the bunched patterns in a cluster ults in the proper creation of the clusters. The patterns included by created cluster in the previous step are The proposed clustering algorithm consists of three steps. eliminated. Thus, the next pass uses unclustered pattern set consisting of remaining patterns for clustering. These three terns. These three steps are described below in detail. The number teps are repeated till all the patterns are clustered of clusters constructed depends on the user defined parameters a and B, called as centering and bunching factors, respectively and Let Rp r and ro represent set of patterns used in the current the values of these parameters are problem dependent. Assume pass, set of patterns clustered in the current pass and set of pat- REIRnh=1, 2,. P) where Rh=(hl, Ih2,., Thn)is the n-dimen- terns that will be used in the next pass, respectively. Then Ra can sional hth pattern belonging to the set R containing P patterns Rn=Rp-R=(RnRn ERp and Rn FR h (i Determining the centroid The Rn calculated in the current pass becomes R, for the next pass To determine the centroid of the cluster, all the patterns are The steps described above are repeated until all the patterns are applied to each of the pattern and the patterns having Euclidian distance less than or equal to a are counted for all the patterns clustered and the process stops when Rn becomes empt If Rn is the pattern with the maximum count then it is selected as the centroid of the cluster 4.1.3. Computing centorid of each cluster ii Bunching The proposed CBBC is used for clustering of the jester data set. The patterns which are falling around the centroid and having The clustering resulted in the three clusters with the Euclidian distance less than or equal to B are bunched in a details of the clusters created and users in each cluster are shown cluster. The centroid of the cluster is recalculated by calculating in the table 3. After bunching as stated in the CBBc algorithm the average of all the patterns bunched in a cluster. Thus the knowing the members of each group we have recomputed new Please cite this article in press as: Shinde, S, K,& Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algo- rithm. Expert Systems with Applications(2011). doi: 10.1016/jeswa2011.08.020
the user. Therefore, performance of these algorithms depends on this manual selection of centroids. Whereas, the proposed clustering algorithm initially calculates centroids appropriately, this results in the proper creation of the clusters. The proposed clustering algorithm consists of three steps, determining the centroids, bunching and removing bunched patterns. These three steps are described below in detail. The number of clusters constructed depends on the user defined parameters a and b, called as centering and bunching factors, respectively and the values of these parameters are problem dependent. Assume R 2 {Rhjh = 1,2,...,P} where Rh = (rh1,rh2,...,rhn) is the n-dimensional hth pattern belonging to the set R containing P patterns to be clustered. (i) Determining the centroid: To determine the centroid of the cluster, all the patterns are applied to each of the pattern and the patterns having Euclidian distance less than or equal to a are counted for all the patterns. If Rh is the pattern with the maximum count then it is selected as the centroid of the cluster. (ii) Bunching: The patterns which are falling around the centroid and having the Euclidian distance less than or equal to b are bunched in a cluster. The centroid of the cluster is recalculated by calculating the average of all the patterns bunched in a cluster. Thus the cluster boundaries are governed by the value of bunching factor. (iii) Removal of the bunched patterns in a cluster: The patterns included by created cluster in the previous step are eliminated. Thus, the next pass uses unclustered pattern set consisting of remaining patterns for clustering. These three steps are repeated till all the patterns are clustered. Let Rp, Rc and Rn represent set of patterns used in the current pass, set of patterns clustered in the current pass and set of patterns that will be used in the next pass, respectively. Then Rn can be described as, Rn ¼ Rp Rc ¼ fRnjRn 2 Rp and Rn R Rcg ð1Þ The Rn calculated in the current pass becomes Rp for the next pass. The steps described above are repeated until all the patterns are clustered and the process stops when Rn becomes empty. 4.1.3. Computing centorid of each cluster The proposed CBBC is used for clustering of the Jester data set. The clustering resulted in the three clusters with a = b = 0.3. The details of the clusters created and users in each cluster are shown in the Table 3. After bunching as stated in the CBBC algorithm, knowing the members of each group, we have recomputed new Fig. 1. System architecture of CBBCHPRS. Table 2 Running example of rating matrix from Jester data set after normalization in the range of 0 to 1. Users J1 J2 J3 J4 J5 J6 J7 J8 J9 J10 U1 0.15 0.94 0.06 0.13 0.16 0.11 0.05 0.72 0.09 0.29 U2 0.71 0.51 0.82 0.73 0.41 0.06 0.48 0.26 0.94 0.96 U3 0.00 0.00 0.00 0.00 0.95 0.96 0.95 0.96 0.00 0.00 U4 0.00 0.92 0.00 0.00 0.60 0.91 0.38 0.81 0.00 0.61 U5 0.92 0.74 0.32 0.26 0.58 0.60 0.85 0.74 0.50 0.79 U6 0.23 0.35 0.54 0.11 0.18 0.31 0.11 0.48 0.20 0.43 U7 0.00 0.00 0.00 0.00 0.93 0.05 0.89 0.94 0.00 0.00 U8 0.84 0.67 0.96 0.22 0.13 0.44 0.96 0.59 0.27 0.31 U9 0.34 0.35 0.07 0.19 0.10 0.51 0.27 0.09 0.14 0.44 U10 0.66 0.76 0.76 0.66 0.82 0.76 0.94 0.64 0.66 0.91 Active User 0.38 0.71 0.00 0.00 0.25 0.00 0.64 0.27 0.00 0.59 0 Indicates not rated. S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx 3 Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
ARTICLE N PRESS S.K. Shinde, U. Kulkarmi/ Expert Systems with Applications Table Users in each cluster with the centroid Process of choosing clusters. hilarity measure(sim) Probability function(P 0.3227 0.3941 0.2831 The table 4 shows the density value associated with each clus members, thus the centroid is the average of all corresponding ter at time t, similarity of active user profile with centroid of each coordinates of the two members cluster and computed probability P(t). for i= 1,.... The clusters chosen are 1 and 2, since the probability P(t) lies in the range. C1={(0.00+0.00)/2,(0.00+0.00)/2,(0.00+0.00) (0.3941-0.1)<=P(t)<=0.3941) (000+0.00)/2,(060+0.932,(0.91+0.05/2) (000+000)/2.(0.00+0.00)/2,(0.00+0.00)/2 4.2.2. Computing the rating quality of the item in each chosen cluster (0.00+000/2)} The rating quality depends on the number of users in the cluster who has rated the items, the individual ratings for the item in the Similarly, we have calculated the centroids of the cluster 1 and 2. given rating matrix and how close the rating provided by the users is, to each other. The rating quality of the item, Q is computed as, 4. 2. Recommendation process for the active user Q max -rating + avg-rating (5) 4.2.1. Choosing the appr The cluster(s)to b pa depends upon two factors viz, den- where max rating is equal to the highest rating of given item and sity of the cluster and similarity with active user profile. the prob- avg rating is equal to the average rating of the item in the chosen ability P(t) that the cluster i is chosen for generating cluster recommendations at time t is expressed as The rating quality of item close to 1, indicates that user has pro- vided good quality rating and vice versa. The table 5 shows com- Pi(t 6()·sim puted rating quality for the jokes 3, 4, 6 and 9 whic are un by the active user in the chosen clusters 1 and 2. here sim, is the value of similarity function to measure the similar- 4.2.3. Ratings of items ty between the active user profile and the centroid of the ith clus- ter, o()is the density of ith cluster at time t and k is the total computed in the chosen clusters clusters in which o for each item lies in the interval ((highest Q-0. 1)<Qs highest Q) are further se- The density of the cluster is determined by Eq (3). If the num- lected for computing rating instead of only the cluster containing bers of users in a cluster are more, the density is more and vice highest Q The rating of each item is then computed from the se- lected clusters by computing the weighted average of the ratings 8(t-Number of users in cluster i (3) using the following equation, The similarity measure of the active user profile is calculated with ating=2i=1(Q: x avg _rating) each cluster in order to find clusters which has user with similar preferences. There are number of possible measures for computing where Q is a quality of the item in the selected cluster, n is number the similarity, for example the Euclidean distance metric, cosine of clusters selected, and avg rating is an average rating of the item similarity and the Pearson correlations metric. We have used in the selected clusters Euclidean distance measure. the distance between the active user The computed ratings are shown in the Table 6. If the number of profile and the centroid of the cluster can be computed using Eq. clusters selected is more than one, then the average rating Table 5 sin( Cent, U={∑cent-U2 (4) Rating Jokes unrated by active Rati Rating quality for cluster where d is dimension of data i. e No. of attributes, Cent is the cen- trod of the cluster i, U is active user profile, Centy is jth attribute of 03532 the cenroid profile in cluster i, and U is the jth attribute of the activeS ser prof 0.1034 03502 The clusters whose probability value lies in the range((highe probability-0. 1)<= probability <=(highest probability )) are chosen for generating recommendations for the active users instead of only the cluster with highest probability. This overcomes the lim- Recommendation generated for active user. itation of Collaborative Filtering recommender system where rec- ommendations are provided based only on the opinion of the ser with most similar preferences. The rating given by the active er for the jokes /i to Jio is normalized in the range of 0 to 1 as lown in the Table 2. The rating 0 indicates that the active user 1.2 has not rated jokes 3, 4, 6 and 9 Please cite this article in press as: Shinde, S K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering alg rithm. Expert Systems with Applications(2011). doi: 10. 1016/j eswa. 2011.08.020
centroids of each cluster. As an example the cluster 3 has two members, thus the centroid is the average of all corresponding coordinates of the two members: C1 ¼ fð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2; ð0:00 þ 0:00Þ=2;ð0:60 þ 0:93=2Þ;ð0:91 þ 0:05=2Þ; ð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2; ð0:00 þ 0:00=2Þg: Similarly, we have calculated the centroids of the cluster 1 and 2. 4.2. Recommendation process for the active user 4.2.1. Choosing the appropriate clusters The cluster (s) to be chosen depends upon two factors viz., density of the cluster and similarity with active user profile. The probability Pi(t) that the cluster i is chosen for generating recommendations at time t is expressed as, PiðtÞ ¼ diðtÞ simi Pk j¼1djðtÞ simj ð2Þ where simi is the value of similarity function to measure the similarity between the active user profile and the centroid of the ith cluster, di(t) is the density of ith cluster at time t and k is the total number of clusters. The density of the cluster is determined by Eq. (3). If the numbers of users in a cluster are more, the density is more and vice versa. diðtÞ ¼ Number of users in cluster i Total number of users ð3Þ The similarity measure of the active user profile is calculated with each cluster in order to find clusters which has user with similar preferences. There are number of possible measures for computing the similarity, for example the Euclidean distance metric, cosine similarity and the Pearson correlations metric. We have used Euclidean distance measure. The distance between the active user profile and the centroid of the cluster can be computed using Eq. (4), simiðCenti; UÞ ¼ Xd j¼1 jCenti;j Ujj 2 ( )1=2 ð4Þ where d is dimension of data i.e. No. of attributes, Centi is the centroid of the cluster i, U is active user profile, Centi,j is jth attribute of the cenroid profile in cluster i, and Uj is the jth attribute of the active user profile. The clusters whose probability value lies in the range {(highest probability-0.1) <= probability <= (highest probability)} are chosen for generating recommendations for the active users instead of only the cluster with highest probability. This overcomes the limitation of Collaborative Filtering recommender system where recommendations are provided based only on the opinion of the user with most similar preferences. The rating given by the active user for the jokes J1 to J10 is normalized in the range of 0 to 1 as shown in the Table 2. The rating 0 indicates that the active user has not rated jokes 3, 4, 6 and 9. The Table 4 shows the density value associated with each cluster at time t, similarity of active user profile with centroid of each cluster and computed probability Pi(t), for i = 1,. . .. . .k. The clusters chosen are 1 and 2, since the probability Pi(t) lies in the range, (0.3941–0.1) <= Pi((t) <= 0.3941). 4.2.2. Computing the rating quality of the item in each chosen cluster The rating quality depends on the number of users in the cluster who has rated the items, the individual ratings for the item in the given rating matrix and how close the rating provided by the users is, to each other. The rating quality of the item, Q is computed as, Q ¼ max rating þ avg rating 2 max rating ; ð5Þ where max_rating is equal to the highest rating of given item and avg_rating is equal to the average rating of the item in the chosen cluster. The rating quality of item close to 1, indicates that user has provided good quality rating and vice versa. The Table 5 shows computed rating quality for the jokes 3, 4, 6 and 9 which are unrated by the active user in the chosen clusters 1 and 2. 4.2.3. Ratings of items Once the quality of each item which is unrated by active user is computed in the chosen clusters, clusters in which Q for each item lies in the interval {(highest Q 0.1) 6Q6 highest Q} are further selected for computing rating instead of only the cluster containing highest Q. The rating of each item is then computed from the selected clusters by computing the weighted average of the ratings using the following equation, Rating ¼ Pn i¼1ðQi avg ratingÞ Pn i¼1Qi ; ð6Þ where Qi is a quality of the item in the selected cluster, n is number of clusters selected, and avg_rating is an average rating of the item in the selected clusters. The computed ratings are shown in the Table 6. If the number of clusters selected is more than one, then the average rating is Table 3 Users in each cluster with the centroid. Cluster Users Centroid 1 U1, U4, U6, U8, U9 C1 2 U2, U5, U10 C2 3 U3, U7 C3 Table 4 Process of choosing clusters. Cluster 1 Cluster 2 Cluster 3 Density value (d) 0.5 0.3 0.2 Similarity measure (sim) 0.57 1.16 1.25 Probability function (P) 0.3227 0.3941 0.2831 Table 5 Rating quality of computed jokes. Jokes unrated by active user Rating quality for cluster 1 Rating quality for cluster 2 J3 0.7113 0.0245 J4 0.1620 0.3532 J6 0.1689 0.1523 J9 0.1034 0.3502 Table 6 Recommendation generated for active user. Jokes Cluster chosen Computed rating J3 1 0.91 J4 2 0.53 J6 1, 2 0.65 J9 2 0.53 4 S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
ARTICLE N PRESS S K Shinde, U Kulkarni/Expert Systems with Applications xxx(2011)xxx-xox For joke 3, rating is computed from cluster 1, for jokes 4 able 7 and 9, rating is computed from the cluster 2 and for joke 6, rating is Cluster result of Iris data by the proposed and traditional methods. from clusters 1 and 2 Versicolor 4.2.4. Provide recommendations to active user Once the quality rating of each item is calculated, the recom- Proposed mendation to the active user is provided e g, joke 3 rating up to 0.91 will be recommended and so on 5.1.2. Performance evaluation of recommender system Thejesterdatasetisavailableonlineonthesitehttp://ww ieor. berkeley. edu/ goldberg/jester-data. The Jester is a www based joke recommender system, developed by University of We have conducted a set of experiments to examine the effec- California, Berkeley. This data has 73421 user entered numeric rat tiveness of our proposed recommender system in terms of accu- ing for 100 jokes, ranging on real value scale from-10 to 10.The quality. In particular, we addressed the following issues(Huang of user-item rating matrix of size 10(users)x10 (jokes)as shown Chung 2004: Jin Zhou, 2005: Kim Atluri, 2004: Kim Li 2006: Somlo Howel, 2001: Vucetic et al., 2005) The precision and recall are most popular metrics for evaluating information retrieval system. For the evaluation of recommender i. How does the confidence parameter affect the performance system, they have been used by various researchers(Billsus &Paz of the prediction? In this paper, we have conducted few zani, 1998: Basu et al, 1998; Sarwar et al., 2000a; Sarwar et al experiments to show the accuracy of the prediction for dif- 2000b). i.How does the neighbour-selection method affect the effi- of completeness. The precision score of 100% indicates that every iency of prediction? Experiments are conducted to examine recommendation retrieved was relevant. The recall score of 100% the accuracy of CBBC algorithm for neighbour-selection indicates that all relevant recommendations were retrieved sev- ii.How do the clusters formed influence the prediction accu- eral ways to evaluate precision and recall exists(Herlocker, Kon racy? Experiments are conducted to examine the impact of tan, Terveen, Riedl, 2004) clustering methods on the final performance of item or user In our work to evaluate CBBCHPRS, users'ratings are taken and content based collaborative filtering. ivided into a training and test set. here the training set consists of iv. The performance CBBCHPRS is evaluated and compared with the user-item rating matrix consisting of 10 users and 10 jokes and ARS using precision, recall, and mean absolute error( MAE). the test data consists of one active user as shown in the Table 2 The algorithm is then trained on the training set and top N- he proposed CBBCHPRS is implemented in MATLAB version items are predicted from that users' test set. The items that both sets, becomes members of the special set which is called as PC with 512 MB memory, running Microsoft Windows XP the hit set Profession a The recall is global measure for whole dataset. When referring to Recommender Systems, it can be defined as the hit set over the test set size 5.1. Simulation results and performance evaluation Recall Size of hit set test n top-NI 5.1.1. Performance evaluation of clustering In order to check the performance of the proposed clustering algorithm, we have first applied the algorithm to real data set. 'Iris data whose true classes are known the iris data set is avail- leinUcirepository(ftp://www.ics.uciedu/pub/machinelearning databases/), which includes 150 objects(50 in each of three classes length,'sepal width, 'petal length, and'petal widthl S(sepal The performance was measured by the accuracy, which is the proportion of objects that are correctly grouped together against the true classes. To investigate the performance more objectively. a simulation study was carried out by generating artificial data sets repetitively and calculating the average performance of the We have applied the proposed method, K-means, and new 号 Recall of ARs medodis to create three clusters using this data without the class Recall of CBbChPrs information. When implementing K-means, the initial centroids Precision of ars re chosen randomly although many other alternatives are avail- Precision of CBBCHPRs able including Al-Daoud and roberts(1996)and Khan and ahmad (2004), etc. The class of an object cannot be predicted by a cluster g algorithm but it may be estimated by examining the cluster result for the class-labeled data. the table 7 shows the result obtained using existing and proposed clustering method 15 The Table 7 shows that the proposed clustering algorithm works better than the traditional algorithms because the algorithm calculates centroids properly instead of selecting randomly Fig. 2. Precision and Recall for 10 clusters. Please cite this article in press as: Shinde, S.K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering algo- rithm. Expert Systems with Applications(2011). doi: 10.1016/jeswa2011.08.020
computed. For joke 3, rating is computed from cluster 1, for jokes 4 and 9, rating is computed from the cluster 2 and for joke 6, rating is computed from clusters 1 and 2. 4.2.4. Provide recommendations to active user Once the quality rating of each item is calculated, the recommendation to the active user is provided, e.g., joke 3 rating up to 0.91 will be recommended and so on. 5. Experiments We have conducted a set of experiments to examine the effectiveness of our proposed recommender system in terms of accuracy of neighbour-selection, cold start and recommendation quality. In particular, we addressed the following issues (Huang & Chung, 2004; Jin & Zhou, 2005; Kim & Atluri, 2004; Kim & Li, 2006; Somlo & Howel, 2001; Vucetic et al., 2005). i. How does the confidence parameter affect the performance of the prediction? In this paper, we have conducted few experiments to show the accuracy of the prediction for different settings of the parameter values. ii. How does the neighbour-selection method affect the effi- ciency of prediction? Experiments are conducted to examine the accuracy of CBBC algorithm for neighbour-selection. iii. How do the clusters formed influence the prediction accuracy? Experiments are conducted to examine the impact of clustering methods on the final performance of item or user content based collaborative filtering. iv. The performance CBBCHPRS is evaluated and compared with ARS using precision, recall, and mean absolute error (MAE). The proposed CBBCHPRS is implemented in MATLAB version 7.2. The experiments are conducted on a 2.0 GHz, Intel Pentium 4 PC with 512 MB memory, running Microsoft Windows XP Professional. 5.1. Simulation results and performance evaluation 5.1.1. Performance evaluation of clustering In order to check the performance of the proposed clustering algorithm, we have first applied the algorithm to real data set, ‘Iris’ data, whose true classes are known. The Iris data set is available in UCI repository (ftp://www.ics.uci.edu/pub/machinelearning databases/), which includes 150 objects (50 in each of three classes – ‘Setosa’, ‘Versicolor’, and ‘Virginica’) having four variables (‘sepal length’, ‘sepal width’, ‘petal length’, and ‘petal width’). The performance was measured by the accuracy, which is the proportion of objects that are correctly grouped together against the true classes. To investigate the performance more objectively, a simulation study was carried out by generating artificial data sets repetitively and calculating the average performance of the method. We have applied the proposed method, K-means, and new Kmedodis to create three clusters using this data without the class information. When implementing K-means, the initial centroids were chosen randomly although many other alternatives are available including AI-Daoud and Roberts (1996) and Khan and Ahmad (2004), etc. The class of an object cannot be predicted by a clustering algorithm but it may be estimated by examining the cluster result for the class-labeled data. The Table 7 shows the result obtained using existing and proposed clustering method. The Table 7 shows that the proposed clustering algorithm works better than the traditional algorithms because the algorithm calculates centroids properly instead of selecting randomly. 5.1.2. Performance evaluation of recommender system The Jester dataset is available online on the site http://www. ieor.berkeley.edu/goldberg/jester-data. The Jester is a WWW based joke recommender system, developed by University of California, Berkeley. This data has 73421 user entered numeric rating for 100 jokes, ranging on real value scale from 10 to 10. The experiments are performed on the small Jester dataset consisting of user-item rating matrix of size 10 (users) 10 (jokes) as shown in the Table 2. The precision and recall are most popular metrics for evaluating information retrieval system. For the evaluation of recommender system, they have been used by various researchers (Billsus & Pazzani, 1998; Basu et al., 1998; Sarwar et al., 2000a; Sarwar et al., 2000b). The precision is a measure of exactness and recall is a measure of completeness. The precision score of 100% indicates that every recommendation retrieved was relevant. The recall score of 100% indicates that all relevant recommendations were retrieved. Several ways to evaluate precision and recall exists (Herlocker, Konstan, Terveen, & Riedl, 2004). In our work to evaluate CBBCHPRS, users’ ratings are taken and divided into a training and test set. Here the training set consists of the user-item rating matrix consisting of 10 users and 10 jokes and the test data consists of one active user as shown in the Table 2. The algorithm is then trained on the training set and top Nitems are predicted from that users’ test set. The items that appear in both sets, becomes members of the special set which is called as the hit set. The recall is global measure for whole dataset. When referring to Recommender Systems, it can be defined as the hit set over the test set size. Recall ¼ Size of hit set Size of test set ¼ jtest \ top-Nj jtestj ð8Þ Table 7 Cluster result of Iris data by the proposed and traditional methods. Algorithms Setosa Versicolor Virginica K-means 50 24 76 K-medodis 50 41 59 Proposed 50 44 56 5 10 15 20 25 30 20 30 40 50 60 70 80 90 Precision, Recall Recommendation set Recall of ARS Recall of CBBCHPRS Precision of ARS Precision of CBBCHPRS Fig. 2. Precision and Recall for 10 clusters. S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx 5 Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
ARTICLE N PRESS S.K. Shinde, U. Kulkarmi/ Expert Systems with Applications xx(2011)xXx-xxx e aver ge quality of an in where Ru(t) is the rating given to the item ty by the user u, Ru(t)is Precision Size of hit set test n top- (9) the predicted value of the user u on the item t,, Tis the test set and IT] is the size of the test set. In addition to CBBCHPRS, ARS (Bedi et al 2009)is also imple We varied the number of rated items provided to the active user mented to compare the performance with our proposed system. from 3, 5, and 10. In order to compare the performance of the pro- The recommendation list is evaluated using precision and recall posed system with ARS, we empirically analyze how MAE evolves for different number of clusters as shown in the Figs. 2 and 3. with the density of rating matrix. The results obtained are shown To evaluate prediction quality metric, we have used the mean in the Fig. 4 absolute error(MAE), a statistical accuracy metric, (Herlocke In this experiment, we have randomly selected 5%, 10%, 15% et al, 2004; Zhao Bing, 2005), which is computed as 20%, 25%, 30%, 35% and 40% of whole rating data from the database to represent different degrees of density of the rating matrix. the number of nearest neighbours is set to 4 while the number of rated items for active user is set to 3. The results show that the density has a great effect on the performance of these algorithms. When the rating matrix becomes crowded all the algorithms result in higher MAE. As seen from Fig 4, the mae curve of our algorithm is below that of the other algorithm, which means that the sparse- ness has the less impact on our proposed algorithm 6. Conclusions This paper describes a novel personalized recommender system that utilizes clustering of user-item rating matrix through pro- sed CBBC and provides the recommendations for the active user with good quality rating using similarity measures. The result from various simulations using Iris data set shows that the proposed Recall of ARs clustering algorithm performs better than K-means and new Recall of CBBCHPRs K-medoid clustering, which helps to improve the quality of rating. Precision of ARs In traditional recommender system similarity is normally the Precision of CBBCHPRS only heuristic used in recommendation process where as in the clusters. This helps in the exploration of other clusters which have hilarity closer to the active user and provide him/ her with good set of recommendations emendation set ion and recall is clusters created in CBBCHPRS and ARS. It is found that the pro Fig. 3. Precision and Recall for 20 clusters. posed CBBCHPRS performs better than ARS and the sparseness has less impact on the proposed algorithm. References Adomavicius, G, Tuzhilin, A (2005) Toward the next ge on of recommender Transactions Knowledge Database Engineering. 17(6). 734-774. Al-Daoud, M. B,& Roberts, s.A(1996- w methods for the initialization User models: theory, method and practice. Intemational Joumal 0.78 V CBBCHPRS mender system based collaborative ants. Journal 0.74 Elaborative on. IEEE Transactions ont Systems, Man and Cybernetics-Part A: ystems and Humans, 34(1). 143-148. Chun Zeng. et al.(2002). Personalized services for digital library. In proce Jun(2009) A simple and fast algorithm for K- clustering Expert Systems with Applications, 3336-3341 Riedl, ]- T.(2004). filtering recommender systems. ACM Transactions on information ystems(TOIS), 22(1)5-53 Fig 4. MAE on each Item for different algorithms (A small value means a better conference on research and development in information retrieval, (pp 145-153). Please cite this article in press as: Shinde, S K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering alg rithm. Expert Systems with Applications(2011). doi: 10. 1016/j eswa. 2011.08.020
The precision when referring to recommender systems can be de- fined as the ratio of hit set size over the top-N set size. It gives the average quality of an individual recommendation. Precision ¼ Size of hit set Size of top-N set ¼ jtest \ top-Nj N ð9Þ In addition to CBBCHPRS, ARS (Bedi et al., 2009) is also implemented to compare the performance with our proposed system. The recommendation list is evaluated using precision and recall for different number of clusters as shown in the Figs. 2 and 3. To evaluate prediction quality metric, we have used the mean absolute error (MAE), a statistical accuracy metric, (Herlocker et al., 2004; Zhao & Bing, 2005), which is computed as, MAE ¼ P u2T jRuðtjÞ e RuðtjÞj jTj ð10Þ where Ru(tj) is the rating given to the item tj by the user u; e RuðtjÞ is the predicted value of the user u on the item tj, T is the test set and jTj is the size of the test set. We varied the number of rated items provided to the active user from 3, 5, and 10. In order to compare the performance of the proposed system with ARS, we empirically analyze how MAE evolves with the density of rating matrix. The results obtained are shown in the Fig. 4. In this experiment, we have randomly selected 5%, 10%, 15%, 20%, 25%, 30%, 35% and 40% of whole rating data from the database to represent different degrees of density of the rating matrix. The number of nearest neighbours is set to 4 while the number of rated items for active user is set to 3. The results show that the density has a great effect on the performance of these algorithms. When the rating matrix becomes crowded, all the algorithms result in higher MAE. As seen from Fig. 4, the MAE curve of our algorithm is below that of the other algorithm, which means that the sparseness has the less impact on our proposed algorithm. 6. Conclusions This paper describes a novel personalized recommender system that utilizes clustering of user-item rating matrix through proposed CBBC and provides the recommendations for the active user with good quality rating using similarity measures. The result from various simulations using Iris data set shows that the proposed clustering algorithm performs better than K-means and new K-medoid clustering, which helps to improve the quality of rating. In traditional recommender system similarity is normally the only heuristic used in recommendation process where as in the proposed CBBCHPRS, similarity is combined with density of the clusters. This helps in the exploration of other clusters which have similarity closer to the active user and provide him/her with good set of recommendations. The precision and recall is compared by varying the number of clusters created in CBBCHPRS and ARS. It is found that the proposed CBBCHPRS performs better than ARS and the sparseness has less impact on the proposed algorithm. References Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions Knowledge Database Engineering, 17(6), 734–774. AI-Daoud, M. B., & Roberts, S. A. (1996). New methods for the initialization of clusters. Pattern Recognition Letters, 17, 451–455. Allen, R. B. (1990). User models: theory, method and practice. International Journal of Man-Machine Studies, 43(11), 27–52. Balabanovic, M., & Sholam, Y. (1997). Combining content-based and collaborative recommendation. Communications of ACM, 40(3), 46–61. Basu. C. H., et al. (1998). Recommendation as classification: Using social and content–based information in recommendation. In Proceedings 15th international conference on artificial intelligence, (pp. 714–720). Bedi, P. et al. (2009). Recommender system based collaborative ants. Journal on Artificial Intelligence, 2(2), 40–55. Billsus, D., & Pazzani, M., (1998). Learning collaborative information filter. In Proceedings 5th international conference on machine learning, (pp. 46–54). Cheung, K. W., & Tsui, K. Ch. (2004). Extended latent class models for collaborative recommendation. IEEE Transactions on Systems, Man and Cybernetics-Part A: Systems and Humans, 34(1), 143–148. Chun Zeng, et al. (2002). Personalized services for digital library. In Proceedings of 5th international conference on asian digital libraries, (pp. 252–253). Hae-Sang & Chi-Hyuck Jun (2009). A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 3336–3341. Herlocker, J. L., Konstan, J. A., Terveen, L. G., & Riedl, J. T. (2004). Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems (TOIS), 22(1), 5–53. Hofmann, T., (2003). Collaborative filtering via Gaussian probabilistic latent semantic analysis. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, (pp. 145–153). 5 10 15 20 25 30 10 20 30 40 50 60 70 80 Precision, Recall Recommendation set Recall of ARS Recall of CBBCHPRS Precision of ARS Precision of CBBCHPRS Fig. 3. Precision and Recall for 20 clusters. 10 20 30 40 50 60 0.7 0.72 0.74 0.76 0.78 0.8 0.82 MAE Size of Neighbour set ARS CBBCHPRS Fig. 4. MAE on each Item for different algorithms. (A small value means a better performance). 6 S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
ARTICLE N PRESS S K Shinde, U Kulkarni/Expert Systems with Applications xxx(2011)xxx-xox Chung, w.(2004 A graph model for E-Commerce recommender Sarwar, S, and Karypis et al. (2000b. Item based Collaborative algorithm. In Journal of the American Society for information Science and Technolo Proceedings of 10th int Jin, x, and Zhou, Y(2005). A maximum entropy web recommendatio Shinde, S K, and Kulkarni, U. V(2008). A new approach for on line recommender ative and content features. In Proceedings 11th ACM GKDD intemational conference on knowledge discovery in data mining.(pp. Shinde, sK, Kulkarni, U V (2010). The hybrid web personalized recommendatio nal Joumal of Data Mining, Modeling Kim, B. M.,& Li, Q(2006). A new appl weight text filtering. In Proceedings collaborati intelligent data analysis. (pp. 53-59). Kim, D. H. and Atlur, V.(2004) A click stream-based collabo ng Ulrike, G, Daniel, F R(2006). Persuasion in recommender systems. Intemational Jourmal of Electronic Commerce, 11(2), 81-100. ACM international workshop on web information and data management, (pp 88- Vucetic, S et al. (2005 ). Collaborative filtering using a regression-based approach. Khan, S.S.& Ahmad, A (2004). Cluster center initialization algorithm for K-means Yu, K Schwaig resp, v.Xu, x,& Kriegel, H. P.(2004). Probabilistic clustering Pattern Recognition Letters, 25, 1293-1302 memory-based co tive filtering. IEEE Transactions Knowledge and Data Koren, Y, Bell, R,& Chris, V.(2009). Matrix factorization techniques for eedings 2nd ACM conference on electroics commerce, press, Minneapolis, USA, (pp. 158-176) Please cite this article in press as: Shinde, S, K,& Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algo- rithm. Expert Systems with Applications(2011). doi: 10.1016/jeswa2011.08.020
Huang, Z., & Chung, W. (2004). A graph model for E-Commerce recommender systems. Journal of the American Society for Information Science and Technology, 55(3), 259–274. Jin, X., and Zhou, Y. (2005). A maximum entropy web recommendation system: combining collaborative and content features. In Proceedings 11th ACM SIGKDD international conference on knowledge discovery in data mining, (pp. 612–617). Kalles, D., Papagelis, A., & Zaliagis, C. (2003). Algorithmic Aspects of Web Intelligent Systems. Web Intelligence. Berlin: Springer, pp. 323–345. Kim, B. M., & Li, Q. (2006). A new approach for combining content-based and collaborative filters. Journal of Intelligent Information Systems, 27, 79–91. Kim, D. H. and Atluri, V. (2004). A click stream-based collaborative filtering personalization model: towards a better performance. In Proceedings 6th annual ACM international workshop on web information and data management, (pp. 88– 95). Khan, S. S., & Ahmad, A. (2004). Cluster center initialization algorithm for K-means clustering. Pattern Recognition Letters, 25, 1293–1302. Koren, Y., Bell, R., & Chris, V. (2009). Matrix factorization techniques for recommender systems. IEEE Computer Society, 42–49. Sarwar, S., and Karypis et al. (2000a). Analysis of recommendation algorithm for ecommerce. In Proceedings 2nd ACM conference on electronics commerce, ACM press, Minneapolis, USA, (pp. 158–176). Sarwar, S., and Karypis et al. (2000b). Item based Collaborative algorithm. In Proceedings of 10th international world wide conference, Hong Kong, (pp. 285– 295). Shinde, S. K., and Kulkarni, U. V. (2008). A new approach for on line recommender system in web usage mining. In Proceedings international conference on advanced computer theory and engineering, (pp. 312–317). Shinde, S. K., & Kulkarni, U. V. (2010). The hybrid web personalized recommendation based on web usage mining. International Journal of Data Mining, Modeling and Management, 2(4), 315–333. Somlo, G.L. and Howel, A. (2001). Adaptive lightweight text filtering. In Proceedings 4th international symposium intelligent data analysis, (pp. 53–59). Ulrike, G., & Daniel, F. R. (2006). Persuasion in recommender systems. International Journal of Electronic Commerce, 11(2), 81–100. Vucetic, S. et al. (2005). Collaborative filtering using a regression-based approach. Journal of Knowledge and Information Systems, 7(1), 22–35. Yu, K., Schwaighofer, A., Tresp, V., Xu, X., & Kriegel, H. P. (2004). Probabilistic memory-based collaborative filtering. IEEE Transactions Knowledge and Data Engineering, 16(1), 56–69. Zhao, Z., & Bing, S. (2005). An adaptive algorithm for personal recommendation. Journal of Changchun University, 15, 22–29. S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx 7 Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algorithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020