正在加载图片...
ARTICLE N PRESS S.K. Shinde, U. Kulkarmi/ Expert Systems with Applications Table Users in each cluster with the centroid Process of choosing clusters. hilarity measure(sim) Probability function(P 0.3227 0.3941 0.2831 The table 4 shows the density value associated with each clus members, thus the centroid is the average of all corresponding ter at time t, similarity of active user profile with centroid of each coordinates of the two members cluster and computed probability P(t). for i= 1,.... The clusters chosen are 1 and 2, since the probability P(t) lies in the range. C1={(0.00+0.00)/2,(0.00+0.00)/2,(0.00+0.00) (0.3941-0.1)<=P(t)<=0.3941) (000+0.00)/2,(060+0.932,(0.91+0.05/2) (000+000)/2.(0.00+0.00)/2,(0.00+0.00)/2 4.2.2. Computing the rating quality of the item in each chosen cluster (0.00+000/2)} The rating quality depends on the number of users in the cluster who has rated the items, the individual ratings for the item in the Similarly, we have calculated the centroids of the cluster 1 and 2. given rating matrix and how close the rating provided by the users is, to each other. The rating quality of the item, Q is computed as, 4. 2. Recommendation process for the active user Q max -rating + avg-rating (5) 4.2.1. Choosing the appr The cluster(s)to b pa depends upon two factors viz, den- where max rating is equal to the highest rating of given item and sity of the cluster and similarity with active user profile. the prob- avg rating is equal to the average rating of the item in the chosen ability P(t) that the cluster i is chosen for generating cluster recommendations at time t is expressed as The rating quality of item close to 1, indicates that user has pro- vided good quality rating and vice versa. The table 5 shows com- Pi(t 6()·sim puted rating quality for the jokes 3, 4, 6 and 9 whic are un by the active user in the chosen clusters 1 and 2. here sim, is the value of similarity function to measure the similar- 4.2.3. Ratings of items ty between the active user profile and the centroid of the ith clus- ter, o()is the density of ith cluster at time t and k is the total computed in the chosen clusters clusters in which o for each item lies in the interval ((highest Q-0. 1)<Qs highest Q) are further se- The density of the cluster is determined by Eq (3). If the num- lected for computing rating instead of only the cluster containing bers of users in a cluster are more, the density is more and vice highest Q The rating of each item is then computed from the se- lected clusters by computing the weighted average of the ratings 8(t-Number of users in cluster i (3) using the following equation, The similarity measure of the active user profile is calculated with ating=2i=1(Q: x avg _rating) each cluster in order to find clusters which has user with similar preferences. There are number of possible measures for computing where Q is a quality of the item in the selected cluster, n is number the similarity, for example the Euclidean distance metric, cosine of clusters selected, and avg rating is an average rating of the item similarity and the Pearson correlations metric. We have used in the selected clusters Euclidean distance measure. the distance between the active user The computed ratings are shown in the Table 6. If the number of profile and the centroid of the cluster can be computed using Eq. clusters selected is more than one, then the average rating Table 5 sin( Cent, U={∑cent-U2 (4) Rating Jokes unrated by active Rati Rating quality for cluster where d is dimension of data i. e No. of attributes, Cent is the cen- trod of the cluster i, U is active user profile, Centy is jth attribute of 03532 the cenroid profile in cluster i, and U is the jth attribute of the activeS ser prof 0.1034 03502 The clusters whose probability value lies in the range((highe probability-0. 1)<= probability <=(highest probability )) are chosen for generating recommendations for the active users instead of only the cluster with highest probability. This overcomes the lim- Recommendation generated for active user. itation of Collaborative Filtering recommender system where rec- ommendations are provided based only on the opinion of the ser with most similar preferences. The rating given by the active er for the jokes /i to Jio is normalized in the range of 0 to 1 as lown in the Table 2. The rating 0 indicates that the active user 1.2 has not rated jokes 3, 4, 6 and 9 Please cite this article in press as: Shinde, S K.& Kulkarni, U Hybrid personalized recommender system using centering-bunching based clustering alg rithm. Expert Systems with Applications(2011). doi: 10. 1016/j eswa. 2011.08.020centroids of each cluster. As an example the cluster 3 has two members, thus the centroid is the average of all corresponding coordinates of the two members: C1 ¼ fð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2; ð0:00 þ 0:00Þ=2;ð0:60 þ 0:93=2Þ;ð0:91 þ 0:05=2Þ; ð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2;ð0:00 þ 0:00Þ=2; ð0:00 þ 0:00=2Þg: Similarly, we have calculated the centroids of the cluster 1 and 2. 4.2. Recommendation process for the active user 4.2.1. Choosing the appropriate clusters The cluster (s) to be chosen depends upon two factors viz., den￾sity of the cluster and similarity with active user profile. The prob￾ability Pi(t) that the cluster i is chosen for generating recommendations at time t is expressed as, PiðtÞ ¼ diðtÞ simi Pk j¼1djðtÞ simj ð2Þ where simi is the value of similarity function to measure the similar￾ity between the active user profile and the centroid of the ith clus￾ter, di(t) is the density of ith cluster at time t and k is the total number of clusters. The density of the cluster is determined by Eq. (3). If the num￾bers of users in a cluster are more, the density is more and vice versa. diðtÞ ¼ Number of users in cluster i Total number of users ð3Þ The similarity measure of the active user profile is calculated with each cluster in order to find clusters which has user with similar preferences. There are number of possible measures for computing the similarity, for example the Euclidean distance metric, cosine similarity and the Pearson correlations metric. We have used Euclidean distance measure. The distance between the active user profile and the centroid of the cluster can be computed using Eq. (4), simiðCenti; UÞ ¼ Xd j¼1 jCenti;j Ujj 2 ( )1=2 ð4Þ where d is dimension of data i.e. No. of attributes, Centi is the cen￾troid of the cluster i, U is active user profile, Centi,j is jth attribute of the cenroid profile in cluster i, and Uj is the jth attribute of the active user profile. The clusters whose probability value lies in the range {(highest probability-0.1) <= probability <= (highest probability)} are chosen for generating recommendations for the active users instead of only the cluster with highest probability. This overcomes the lim￾itation of Collaborative Filtering recommender system where rec￾ommendations are provided based only on the opinion of the user with most similar preferences. The rating given by the active user for the jokes J1 to J10 is normalized in the range of 0 to 1 as shown in the Table 2. The rating 0 indicates that the active user has not rated jokes 3, 4, 6 and 9. The Table 4 shows the density value associated with each clus￾ter at time t, similarity of active user profile with centroid of each cluster and computed probability Pi(t), for i = 1,. . .. . .k. The clusters chosen are 1 and 2, since the probability Pi(t) lies in the range, (0.3941–0.1) <= Pi((t) <= 0.3941). 4.2.2. Computing the rating quality of the item in each chosen cluster The rating quality depends on the number of users in the cluster who has rated the items, the individual ratings for the item in the given rating matrix and how close the rating provided by the users is, to each other. The rating quality of the item, Q is computed as, Q ¼ max rating þ avg rating 2 max rating ; ð5Þ where max_rating is equal to the highest rating of given item and avg_rating is equal to the average rating of the item in the chosen cluster. The rating quality of item close to 1, indicates that user has pro￾vided good quality rating and vice versa. The Table 5 shows com￾puted rating quality for the jokes 3, 4, 6 and 9 which are unrated by the active user in the chosen clusters 1 and 2. 4.2.3. Ratings of items Once the quality of each item which is unrated by active user is computed in the chosen clusters, clusters in which Q for each item lies in the interval {(highest Q 0.1) 6Q6 highest Q} are further se￾lected for computing rating instead of only the cluster containing highest Q. The rating of each item is then computed from the se￾lected clusters by computing the weighted average of the ratings using the following equation, Rating ¼ Pn i¼1ðQi  avg ratingÞ Pn i¼1Qi ; ð6Þ where Qi is a quality of the item in the selected cluster, n is number of clusters selected, and avg_rating is an average rating of the item in the selected clusters. The computed ratings are shown in the Table 6. If the number of clusters selected is more than one, then the average rating is Table 3 Users in each cluster with the centroid. Cluster Users Centroid 1 U1, U4, U6, U8, U9 C1 2 U2, U5, U10 C2 3 U3, U7 C3 Table 4 Process of choosing clusters. Cluster 1 Cluster 2 Cluster 3 Density value (d) 0.5 0.3 0.2 Similarity measure (sim) 0.57 1.16 1.25 Probability function (P) 0.3227 0.3941 0.2831 Table 5 Rating quality of computed jokes. Jokes unrated by active user Rating quality for cluster 1 Rating quality for cluster 2 J3 0.7113 0.0245 J4 0.1620 0.3532 J6 0.1689 0.1523 J9 0.1034 0.3502 Table 6 Recommendation generated for active user. Jokes Cluster chosen Computed rating J3 1 0.91 J4 2 0.53 J6 1, 2 0.65 J9 2 0.53 4 S.K. Shinde, U. Kulkarni / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Shinde, S. K., & Kulkarni, U. Hybrid personalized recommender system using centering-bunching based clustering algo￾rithm. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.08.020
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有