正在加载图片...
ARTICLE N PRESS S K Shinde, U Kulkarni/Expert Systems with Applications xxx(2011)xxx-xox For joke 3, rating is computed from cluster 1, for jokes 4 able 7 and 9, rating is computed from the cluster 2 and for joke 6, rating is Cluster result of Iris data by the proposed and traditional methods. from clusters 1 and 2 Versicolor 4.2.4. Provide recommendations to active user Once the quality rating of each item is calculated, the recom- Proposed mendation to the active user is provided e g, joke 3 rating up to 0.91 will be recommended and so on 5.1.2. Performance evaluation of recommender system Thejesterdatasetisavailableonlineonthesitehttp://ww ieor. berkeley. edu/ goldberg/jester-data. The Jester is a www based joke recommender system, developed by University of We have conducted a set of experiments to examine the effec- California, Berkeley. This data has 73421 user entered numeric rat tiveness of our proposed recommender system in terms of accu- ing for 100 jokes, ranging on real value scale from-10 to 10.The quality. In particular, we addressed the following issues(Huang of user-item rating matrix of size 10(users)x10 (jokes)as shown Chung 2004: Jin Zhou, 2005: Kim Atluri, 2004: Kim Li 2006: Somlo Howel, 2001: Vucetic et al., 2005) The precision and recall are most popular metrics for evaluating information retrieval system. For the evaluation of recommender i. How does the confidence parameter affect the performance system, they have been used by various researchers(Billsus &Paz of the prediction? In this paper, we have conducted few zani, 1998: Basu et al, 1998; Sarwar et al., 2000a; Sarwar et al experiments to show the accuracy of the prediction for dif- 2000b). i.How does the neighbour-selection method affect the effi- of completeness. The precision score of 100% indicates that every iency of prediction? Experiments are conducted to examine recommendation retrieved was relevant. The recall score of 100% the accuracy of CBBC algorithm for neighbour-selection indicates that all relevant recommendations were retrieved sev- ii.How do the clusters formed influence the prediction accu- eral ways to evaluate precision and recall exists(Herlocker, Kon racy? Experiments are conducted to examine the impact of tan, Terveen, Riedl, 2004) clustering methods on the final performance of item or user In our work to evaluate CBBCHPRS, users'ratings are taken and content based collaborative filtering. ivided into a training and test set. here the training set consists of iv. The performance CBBCHPRS is evaluated and compared with the user-item rating matrix consisting of 10 users and 10 jokes and ARS using precision, recall, and mean absolute error( MAE). the test data consists of one active user as shown in the Table 2 The algorithm is then trained on the training set and top N- he proposed CBBCHPRS is implemented in MATLAB version items are predicted from that users' test set. The items that both sets, becomes members of the special set which is called as PC with 512 MB memory, running Microsoft Windows XP the hit set Profession a The recall is global measure for whole dataset. When referring to Recommender Systems, it can be defined as the hit set over the test set size 5.1. Simulation results and performance evaluation Recall Size of hit set test n top-NI 5.1.1. Performance evaluation of clustering In order to check the performance of the proposed clustering algorithm, we have first applied the algorithm to real data set. 'Iris data whose true classes are known the iris data set is avail- leinUcirepository(ftp://www.ics.uciedu/pub/machinelearning databases/), which includes 150 objects(50 in each of three classes length,'sepal width, 'petal length, and'petal widthl S(sepal The performance was measured by the accuracy, which is the proportion of objects that are correctly grouped together against the true classes. To investigate the performance more objectively. a simulation study was carried out by generating artificial data sets repetitively and calculating the average performance of the We have applied the proposed method, K-means, and new 号 Recall of ARs medodis to create three clusters using this data without the class Recall of CBbChPrs information. When implementing K-means, the initial centroids Precision of ars re chosen randomly although many other alternatives are avail- Precision of CBBCHPRs able including Al-Daoud and roberts(1996)and Khan and ahmad (2004), etc. The class of an object cannot be predicted by a cluster g algorithm but it may be estimated by examining the cluster result for the class-labeled data. the table 7 shows the result obtained using existing and proposed clustering method 15 The Table 7 shows that the proposed clustering algorithm works better than the traditional algorithms because the algorithm calculates centroids properly instead of selecting randomly Fig. 2. Precision and Recall for 10 clusters. Please cite this article in press as: Shinde, S.K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering algo- rithm. Expert Systems with Applications(2011). doi: 10.1016/jeswa2011.08.020computed. For joke 3, rating is computed from cluster 1, for jokes 4 and 9, rating is computed from the cluster 2 and for joke 6, rating is computed from clusters 1 and 2. 4.2.4. Provide recommendations to active user Once the quality rating of each item is calculated, the recom￾mendation to the active user is provided, e.g., joke 3 rating up to 0.91 will be recommended and so on. 5. Experiments We have conducted a set of experiments to examine the effec￾tiveness of our proposed recommender system in terms of accu￾racy of neighbour-selection, cold start and recommendation quality. In particular, we addressed the following issues (Huang & Chung, 2004; Jin & Zhou, 2005; Kim & Atluri, 2004; Kim & Li, 2006; Somlo & Howel, 2001; Vucetic et al., 2005). i. How does the confidence parameter affect the performance of the prediction? In this paper, we have conducted few experiments to show the accuracy of the prediction for dif￾ferent settings of the parameter values. ii. How does the neighbour-selection method affect the effi- ciency of prediction? Experiments are conducted to examine the accuracy of CBBC algorithm for neighbour-selection. iii. How do the clusters formed influence the prediction accu￾racy? Experiments are conducted to examine the impact of clustering methods on the final performance of item or user content based collaborative filtering. iv. The performance CBBCHPRS is evaluated and compared with ARS using precision, recall, and mean absolute error (MAE). The proposed CBBCHPRS is implemented in MATLAB version 7.2. The experiments are conducted on a 2.0 GHz, Intel Pentium 4 PC with 512 MB memory, running Microsoft Windows XP Professional. 5.1. Simulation results and performance evaluation 5.1.1. Performance evaluation of clustering In order to check the performance of the proposed clustering algorithm, we have first applied the algorithm to real data set, ‘Iris’ data, whose true classes are known. The Iris data set is avail￾able in UCI repository (ftp://www.ics.uci.edu/pub/machinelearning databases/), which includes 150 objects (50 in each of three classes – ‘Setosa’, ‘Versicolor’, and ‘Virginica’) having four variables (‘sepal length’, ‘sepal width’, ‘petal length’, and ‘petal width’). The performance was measured by the accuracy, which is the proportion of objects that are correctly grouped together against the true classes. To investigate the performance more objectively, a simulation study was carried out by generating artificial data sets repetitively and calculating the average performance of the method. We have applied the proposed method, K-means, and new K￾medodis to create three clusters using this data without the class information. When implementing K-means, the initial centroids were chosen randomly although many other alternatives are avail￾able including AI-Daoud and Roberts (1996) and Khan and Ahmad (2004), etc. The class of an object cannot be predicted by a cluster￾ing algorithm but it may be estimated by examining the cluster result for the class-labeled data. The Table 7 shows the result obtained using existing and proposed clustering method. The Table 7 shows that the proposed clustering algorithm works better than the traditional algorithms because the algorithm calculates centroids properly instead of selecting randomly. 5.1.2. Performance evaluation of recommender system The Jester dataset is available online on the site http://www. ieor.berkeley.edu/goldberg/jester-data. The Jester is a WWW based joke recommender system, developed by University of California, Berkeley. This data has 73421 user entered numeric rat￾ing for 100 jokes, ranging on real value scale from 10 to 10. The experiments are performed on the small Jester dataset consisting of user-item rating matrix of size 10 (users) 10 (jokes) as shown in the Table 2. The precision and recall are most popular metrics for evaluating information retrieval system. For the evaluation of recommender system, they have been used by various researchers (Billsus & Paz￾zani, 1998; Basu et al., 1998; Sarwar et al., 2000a; Sarwar et al., 2000b). The precision is a measure of exactness and recall is a measure of completeness. The precision score of 100% indicates that every recommendation retrieved was relevant. The recall score of 100% indicates that all relevant recommendations were retrieved. Sev￾eral ways to evaluate precision and recall exists (Herlocker, Kon￾stan, Terveen, & Riedl, 2004). In our work to evaluate CBBCHPRS, users’ ratings are taken and divided into a training and test set. Here the training set consists of the user-item rating matrix consisting of 10 users and 10 jokes and the test data consists of one active user as shown in the Table 2. The algorithm is then trained on the training set and top N￾items are predicted from that users’ test set. The items that appear in both sets, becomes members of the special set which is called as the hit set. The recall is global measure for whole dataset. When referring to Recommender Systems, it can be defined as the hit set over the test set size. Recall ¼ Size of hit set Size of test set ¼ jtest \ top-Nj jtestj ð8Þ Table 7 Cluster result of Iris data by the proposed and traditional methods. Algorithms Setosa Versicolor Virginica K-means 50 24 76 K-medodis 50 41 59 Proposed 50 44 56 5 10 15 20 25 30 20 30 40 50 60 70 80 90 Precision, Recall Recommendation set Recall of ARS Recall of CBBCHPRS Precision of ARS Precision of CBBCHPRS Fig. 2. Precision and Recall for 10 clusters. S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx 5 Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algo￾rithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有