正在加载图片...
ARTICLE N PRESS S.K. Shinde, U. Kulkarmi/ Expert Systems with Applications xxx(2011 )Xxx-xXxx sect gives performance evaluation with the existing algo- Kulkarni, 2008: Shinde& Kulkarni, XXXX; Yu, Schwaighofer, Tresp rithms. Finally, the section 6 concludes the paper. Xu, Kriegel, 2004)combine content and collaborative based filter- ng to overcome these limitations. As stated below, there are differ- 2. Input to the recommendation systems ent ways of combining content and collaborative based filtering ( Cheung Tsui, 2004). The recommendation systems are being widely used in many applications such as Amazon. com, Net-flix com etc; to suggest L. Implementing these approaches separately and combining roducts, services, and information of items to potential consum- them for prediction. ers. At the heart of recommendation technologies are the algo- Incorporating some content based characteristics into col- ithms for making recommendations based on various types of laborative approach and vice versa input data. In e-commerce, most recommendation algorithms take iii. Constructing a general unified model that incorporates both the following three types of data as an input: product attributes content and collaborative based characteristics consumer attributes, and previous interactions between consum- rs and products (e.g, buying, rating, and catalog browsing). The The hybrid approach proposed in this paper extracts user's cur input data types are summarized in the table 1 rent browsing patterns using web usage mining, and forms a clu ter of items with similar psychology to obtain implicit users rating 3. Personalized recommendation techniques for the recommended item In the recent years web personalization has undergone through 4 Proposed CBBCHPRS tremendous changes. The content (Allen, 1990: Kalles, Papagelis Zaliagis, 2003), collaborative( Hofmann, 2003)and hybrid Alaba- ve arvelo ed and tested the CBBCHPrS for Jester dataset novic Sholam, 1997)based filtering are three basic approaches available on website of California University, Berkeley. The system used to design recommendation systems. architecture has been partitioned into two main phases: offline and The content based filtering(Chun Zeng et al. 2002)relies on the online. The Fig. 1 depicts the architecture of CBBCHPRS with its content of an item that user has experienced before The content essential components ased information filtering has proven to be effective in locating The phase I is offline. It does the preprocessing and clusterin text, items that are relevant to the topic using techniques such as In this phase background data in the form of user-item rating ma- Boolean queries, vector space queries etc. However, content based trix is collected and clustered using the proposed approach which filtering has some limitations. It is difficult to provide appropriate is described in section 4.1.2. Once the clusters are obtained the recommendation because all the information is selected and rec- cluster data along with their centroids are stored for future ommended based on the content moreover the content based recommendations filtering leads to overspecialization i.e. it recommends all the re- The phase ll is online in which the recommendation takes place ated items instead of the particular item liked by the user. The col- for the active user. Here, similarity and density of clusters are cal- borative-filtering(Ulrike Daniel, 2006)aims to identify users culated for choosing best clusters for making recommendations who have relevant interests and preferences by calculating similar- The rating quality of each item unrated by active user is computed ities and dissimilarities between their profiles. The idea behind this in the chosen clusters. To generate the recommendations, clusters ing the behavior of other users who shares similar interests and mendations are then made by computing the weighted average of hose opinions can be trusted may be beneficial. The different the rating of items in the selected clusters chniques have been proposed for collaborative recommendation: The working of CBBCHPRS is described below in detail with the such as correlation based method, semantic indexing etc. The col- Jester dataset laborative filtering overcomes some of the limitations of the con- tent based filtering. The system can suggest items to the user, 4. 1. Preprocessing phase based on the rating of items, instead of the content of the items which can improve the quality of recommendations. However, col- 4.1.1. Normalization of data laborative filtering has some drawbacks. The first drawback is that User-item rating taken from Jester dataset rated in the scale of the coverage of rating could be very sparse thereby resulting in poor -10 to +10 is normalized in the scale of 0 to 1, where 0 indicates quality recommendation. In the case of the addition of new items that item is not rated by corresponding user To facilitate the dis- into database, the system would not be able to recommend until cussion, running example shown in the Table 2 is used, where that item is served to a substantial number of users known as U1-U10 are the users and Ji1o are the items Jokes)rated or un- cold-start. Secondly, when new users are added, the system must rated by users. The last row of Table 2 gives ratings of the active learn the user preferences from the rating of users, in order to make user. curate recommendations moreover. these recommendation algorithms seem to be very extensive and grow non-linearly when 4.1.2. Centering-bunching based clustering the number of users and items in a database increase. The hybrid In the K-means, and new K-medodis(Hae-Sang Chi-Hyuck recommendation systems(Adomavicius Tuzhilin, 2005: Shinde Jun, 2009)clustering algorithm centroids are initially selected by Table 1 Taxonomy of input data. ating scores such as discrete multilevel and continuous ratings: and based on latest comments such as best, good, bad, worse and so on. 3 Behavior pattern dura of browsing, click times, the links of webs; save, print, scroll, delete, open, close, refres sh of webs: selection, edition, search, copy. 4 Transaction data purchasing date, purchase quantity, price, discounting and so on 5 Production data for movies, jokes or music, actor or singer, topic, release time, price, brand and Please cite this article in press as: Shinde, S K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering alg rithm. Expert Systems with Applications(2011). doi: 10. 1016/j eswa. 2011.08.020section also gives performance evaluation with the existing algo￾rithms. Finally, the section 6 concludes the paper. 2. Input to the recommendation systems The recommendation systems are being widely used in many applications such as Amazon.com, Net-flix.com etc.; to suggest products, services, and information of items to potential consum￾ers. At the heart of recommendation technologies are the algo￾rithms for making recommendations based on various types of input data. In e-commerce, most recommendation algorithms take the following three types of data as an input: product attributes, consumer attributes, and previous interactions between consum￾ers and products (e.g., buying, rating, and catalog browsing). The input data types are summarized in the Table 1. 3. Personalized recommendation techniques In the recent years web personalization has undergone through tremendous changes. The content (Allen, 1990; Kalles, Papagelis, & Zaliagis, 2003), collaborative (Hofmann, 2003) and hybrid (Balaba￾novic & Sholam, 1997) based filtering are three basic approaches used to design recommendation systems. The content based filtering (Chun Zeng et al., 2002) relies on the content of an item that user has experienced before. The content based information filtering has proven to be effective in locating text, items that are relevant to the topic using techniques such as Boolean queries, vector space queries etc. However, content based filtering has some limitations. It is difficult to provide appropriate recommendation because all the information is selected and rec￾ommended based on the content. Moreover, the content based filtering leads to overspecialization i.e. it recommends all the re￾lated items instead of the particular item liked by the user. The col￾laborative-filtering (Ulrike & Daniel, 2006) aims to identify users who have relevant interests and preferences by calculating similar￾ities and dissimilarities between their profiles. The idea behind this method is that to one’s search the information collected by consult￾ing the behavior of other users who shares similar interests and whose opinions can be trusted may be beneficial. The different techniques have been proposed for collaborative recommendation; such as correlation based method, semantic indexing etc. The col￾laborative filtering overcomes some of the limitations of the con￾tent based filtering. The system can suggest items to the user, based on the rating of items, instead of the content of the items which can improve the quality of recommendations. However, col￾laborative filtering has some drawbacks. The first drawback is that the coverage of rating could be very sparse thereby resulting in poor quality recommendation. In the case of the addition of new items into database, the system would not be able to recommend until that item is served to a substantial number of users known as cold-start. Secondly, when new users are added, the system must learn the user preferences from the rating of users, in order to make accurate recommendations. Moreover, these recommendation algorithms seem to be very extensive and grow non-linearly when the number of users and items in a database increase. The hybrid recommendation systems (Adomavicius & Tuzhilin, 2005; Shinde & Kulkarni, 2008; Shinde & Kulkarni, xxxx; Yu, Schwaighofer, Tresp, Xu, & Kriegel, 2004) combine content and collaborative based filter￾ing to overcome these limitations. As stated below, there are differ￾ent ways of combining content and collaborative based filtering (Cheung & Tsui, 2004). i. Implementing these approaches separately and combining them for prediction. ii. Incorporating some content based characteristics into col￾laborative approach and vice versa. iii. Constructing a general unified model that incorporates both content and collaborative based characteristics. The hybrid approach proposed in this paper extracts user’s cur￾rent browsing patterns using web usage mining, and forms a clus￾ter of items with similar psychology to obtain implicit users rating for the recommended item. 4. Proposed CBBCHPRS We have developed and tested the CBBCHPRS for Jester dataset available on website of California University, Berkeley. The system architecture has been partitioned into two main phases; offline and online. The Fig. 1 depicts the architecture of CBBCHPRS with its essential components. The phase I is offline. It does the preprocessing and clustering. In this phase background data in the form of user-item rating ma￾trix is collected and clustered using the proposed approach which is described in section 4.1.2. Once the clusters are obtained the cluster data along with their centroids are stored for future recommendations. The phase II is online in which the recommendation takes place for the active user. Here, similarity and density of clusters are cal￾culated for choosing best clusters for making recommendations. The rating quality of each item unrated by active user is computed in the chosen clusters. To generate the recommendations, clusters are further selected based on rating quality of an item. The recom￾mendations are then made by computing the weighted average of the rating of items in the selected clusters. The working of CBBCHPRS is described below in detail with the Jester dataset. 4.1. Preprocessing phase 4.1.1. Normalization of data User-item rating taken from Jester dataset rated in the scale of 10 to +10 is normalized in the scale of 0 to 1, where 0 indicates that item is not rated by corresponding user. To facilitate the dis￾cussion, running example shown in the Table 2 is used, where U1-U10 are the users and J1-J10 are the items (jokes) rated or un￾rated by users. The last row of Table 2 gives ratings of the active user. 4.1.2. Centering-bunching based clustering In the K-means, and new K-medodis (Hae-Sang & Chi-Hyuck Jun, 2009) clustering algorithm centroids are initially selected by Table 1 Taxonomy of input data. 1 Demographic data name, age, gender, profession, birth date, telephone, address, hobbies, salary, education, experience and so on 2 Rating data rating scores such as discrete multilevel and continuous ratings; and based on latest comments such as best, good, bad, worse and so on. 3 Behavior pattern data duration of browsing, click times, the links of webs; save, print, scroll, delete, open, close, refresh of webs; selection, edition, search, copy, paste, and so on. 4 Transaction data purchasing date, purchase quantity, price, discounting and so on 5 Production data for movies, jokes or music, actor or singer, topic, release time, price, brand and so on. 2 S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algo￾rithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有