正在加载图片...
ARTICLE N PRESS J. Pinho Lucas et al. Expert Systems with Applications xxx(2011)x0x-xoXx Table 8 Table 10 Results of cBa False positive rates. Accuracy(%) Num rules Support(%) Densi C45(x) Crossing World 399 79.43 177 4245 BCrossing world 10 79.26 CRossing usa 41.55 BCrossing USA 10 34.66 duced as well, and therefore, the precision is also reduced. Never theless, conservative classifiers are appropriate for a recommender system scenario, in which false positives need to be avoided. Accuracy(%) Num rules Support(%) Densi It should be noted that, generally, associative classifiers do not consider a default rule to classify an instance which would not 69.58 35230 match any rule generated, as would be done in other traditional classifiers. This would indeed lead to recommend an item that does BCrossing World 10 42.34 11.7 not match the users needs. This way, an associative classifier does not classify a user whose data does not match with any rule gener- ated. Conversely, other classifiers always classify the active user Therefore, we can conclude that the CBa algorithm is more they classify every sample provided as input. appropriate than CMar to be applied in recommender systems. Re- Table 10 shows the false positive rates obtained by Bayes Net, sults have shown that CMAR is less effective on sparser datasets C4.5, CBA on the same datasets employed in the previous subsec- (more distinct values or less number of records ) This may be jus- tion, where we also set the same confidence and support threshold tified by the data structure it employs, which is a FP-Tree( Frequent values. attern Tree). Such structure stores frequent itemsets in a compact Table 10 shows that Bayes Net presented the greatest false po- n relations between itemsets are explored. sitive rates among all algorithms studied. This suggests that Bayes o items stored in the FP-Tree were not frequent enough to ge Net is more susceptible to the occurrence of false positive errors ate rules. Besides, usually every new associative classifier paper than rule-based classifiers. Taking into account the CBa algorithm, uses CBa as reference to make comparative studies and to validate it obtained, in average, a false positive rate of 10% lesser than C4.5. their algorithms. The main contribution of recent associative clas- which was superior to Bayes Net. In addition to that, the false po- sifiers is related to memory usage and processing time because, in sitive rate for the"CRossing USA"dataset was 21. 12% and in com- general, accuracy is higher just in some datasets and such incre- parison with Bayes Net and C4.5. This scenario shows the use of ment is not very substantial associative classifiers is very appropriate for avoiding false posi Subsequently, we are going to consider by means of false posi tives occurrence in recommender systems, given that they were tive occurrence on algorithms, more advantages on employing dramatically reduced using the CBa algorithm associative classifiers on recommender systems. 6. Conclusion 5.5. Analyzing false positives occurrence According to what was described in Section 4, desig In this work we focused on revealing how the most typical and ommender methods should have a major concern on avoiding fa critical drawbacks of recommender systems may avoided and alle- positive errors. In this context, in this subsection we desc viated. To do so, we tried some machine learning, especially classi fication based on association, methods on diverse types of data. We experiments we made in order to compare the false positive rates also used data gathered from real recommender systems, including of associative classifiers and traditional classifiers. To do so, we the BookCrossing, which is a data base concerning a domain that ose three algorithms: Bayes Net, C4.5 and CBA. The first two rep- ha as never been used before in case studies on recommender sys- resent two different classes of machine learning methods largely tems. According to the results obtained in the first steps of the case employed in recommender systems, probabilistic classification study developed in Section 5, we have driven this work on evalu and rule-based classification, respectively. The third one, CBA, rep- ating classification based on association. resents associative classifiers and was chosen basically because it presented considerably greater precision than the other two asso- on association algorithms in recommender systems. First, due to it ciative classifiers when tested on recommender systems data, is an off-line model and, consequently, the processing time an especially on the datasets of Bookcrossing. scalability are not a major concern on recommender systems In order to calculate the false positive rate of a given class Ci, we employing associative classifiers. Likewise, sparsity may be allevi- employed the same approach defined by Fawcett(2003), which is ated as well, because their features proved to be (according to the stated as follows experiments made in Sections 5.2 and 5.3)less susceptible to pres FP rate= negatives incorrectly classified /total negatives ent a radical loss of precision on sparse datasets. Since a recommender model using associative classif The false positive rate is also called"false alarm rate", because it rules considering several users'attributes, the gray sheep problem counts negative instances(samples not owning to the class cu is also likely to be alleviated. In addition, the analyst may set up the being analyzed that were incorrectly classified. On the other hand, input parameters of the algorithm in order to generate more rules the true positives represent the instances owing to cu that were that would likely be suitable for more users. Furthermore, the correctly classified (also called hits). An ideal"conservative classi- number of false positives is reduced dramatically when we tested fier"is the one which classifies positives instances(the ones own- the CBa algorithm(Section 5.5). g to c, only with strong evidence, as consequence they make By means of the experiments described in Section 5.4, we argue few false positive errors, conversely its true positive rates is re- that CBa is more effective than CMAR on sparser datasets. Thus, Please cite this article in press as: Pinho Lucas. ] et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys- tems Expert Systems with Applications(2011). doi: 10.1016/jeswa. 2011.07 136Therefore, we can conclude that the CBA algorithm is more appropriate than CMAR to be applied in recommender systems. Re￾sults have shown that CMAR is less effective on sparser datasets (more distinct values or less number of records). This may be jus￾tified by the data structure it employs, which is a FP-Tree (Frequent Pattern Tree). Such structure stores frequent itemsets in a compact way in which common relations between itemsets are explored. So, items stored in the FP-Tree were not frequent enough to gener￾ate rules. Besides, usually every new associative classifier paper uses CBA as reference to make comparative studies and to validate their algorithms. The main contribution of recent associative clas￾sifiers is related to memory usage and processing time, because, in general, accuracy is higher just in some datasets and such incre￾ment is not very substantial. Subsequently, we are going to consider, by means of false posi￾tive occurrence on algorithms, more advantages on employing associative classifiers on recommender systems. 5.5. Analyzing false positives occurrence According to what was described in Section 4, designers of rec￾ommender methods should have a major concern on avoiding false positive errors. In this context, in this subsection we describe some experiments we made in order to compare the false positive rates of associative classifiers and traditional classifiers. To do so, we chose three algorithms: Bayes Net, C4.5 and CBA. The first two rep￾resent two different classes of machine learning methods largely employed in recommender systems, probabilistic classification and rule-based classification, respectively. The third one, CBA, rep￾resents associative classifiers and was chosen basically because it presented considerably greater precision than the other two asso￾ciative classifiers when tested on recommender systems data, especially on the datasets of BookCrossing. In order to calculate the false positive rate of a given class c1, we employed the same approach defined by Fawcett (2003), which is stated as follows: FP rate ¼ negatives incorrectly classified=total negatives The false positive rate is also called ‘‘false alarm rate’’, because it counts negative instances (samples not owning to the class c1 being analyzed) that were incorrectly classified. On the other hand, the true positives represent the instances owing to c1 that were correctly classified (also called hits). An ideal ‘‘conservative classi- fier’’ is the one which classifies positives instances (the ones own￾ing to c1) only with strong evidence, as consequence they make few false positive errors, conversely its true positive rates is re￾duced as well, and therefore, the precision is also reduced. Never￾theless, conservative classifiers are appropriate for a recommender system scenario, in which false positives need to be avoided. It should be noted that, generally, associative classifiers do not consider a default rule to classify an instance which would not match any rule generated, as would be done in other traditional classifiers. This would indeed lead to recommend an item that does not match the user’s needs. This way, an associative classifier does not classify a user whose data does not match with any rule gener￾ated. Conversely, other classifiers always classify the active user as they classify every sample provided as input. Table 10 shows the false positive rates obtained by Bayes Net, C4.5, CBA on the same datasets employed in the previous subsec￾tion, where we also set the same confidence and support threshold values. Table 10 shows that Bayes Net presented the greatest false po￾sitive rates among all algorithms studied. This suggests that Bayes Net is more susceptible to the occurrence of false positive errors than rule-based classifiers. Taking into account the CBA algorithm, it obtained, in average, a false positive rate of 10% lesser than C4.5, which was superior to Bayes Net. In addition to that, the false po￾sitive rate for the ‘‘BCrossing USA’’ dataset was 21.12% and in com￾parison with Bayes Net and C4.5. This scenario shows the use of associative classifiers is very appropriate for avoiding false posi￾tives occurrence in recommender systems, given that they were dramatically reduced using the CBA algorithm. 6. Conclusion In this work we focused on revealing how the most typical and critical drawbacks of recommender systems may avoided and alle￾viated. To do so, we tried some machine learning, especially classi- fication based on association, methods on diverse types of data. We also used data gathered from real recommender systems, including the BookCrossing, which is a data base concerning a domain that has never been used before in case studies on recommender sys￾tems. According to the results obtained in the first steps of the case study developed in Section 5, we have driven this work on evalu￾ating classification based on association. There are several advantages on employing classification based on association algorithms in recommender systems. First, due to it is an off-line model and, consequently, the processing time and scalability are not a major concern on recommender systems employing associative classifiers. Likewise, sparsity may be allevi￾ated as well, because their features proved to be (according to the experiments made in Sections 5.2 and 5.3) less susceptible to pres￾ent a radical loss of precision on sparse datasets. Since a recommender model using associative classification has general rules considering several users’ attributes, the gray sheep problem is also likely to be alleviated. In addition, the analyst may set up the input parameters of the algorithm in order to generate more rules that would likely be suitable for more users. Furthermore, the number of false positives is reduced dramatically when we tested the CBA algorithm (Section 5.5). By means of the experiments described in Section 5.4, we argue that CBA is more effective than CMAR on sparser datasets. Thus, Table 8 Results of CBA. Dataset Accuracy (%) Num. rules Support (%) Density BCrossing USA 80.24 15.9 5 0.18 BCrossing USA 10 80.56 12.3 10 0.7 BCrossing World 79.43 17.7 5 0.062 BCrossing World 10 79.26 10.9 10 0.36 MovieLens 72.77 10.3 15 0.86 Table 9 Results of CMAR. Dataset Accuracy (%) Num. rules Support (%) Density BCrossing USA 71.26 18.7 3 0.18 BCrossing USA 10 69.58 16 5 0.7 BCrossing World 61.7 17.1 2 0.062 BCrossing World 10 42.34 11.3 3 0.36 MovieLens 75.63 11.7 10 0.86 Table 10 False positive rates. Dataset Bayes Net (%) C4.5 (%) CBA (%) MovieLens 47.4 42.6 33.22 BCrossing World 45.15 39.9 23.03 BCrossing World 10 40.3 42.45 33.83 BCrossing USA 47.45 41.55 20.43 BCrossing USA 10 48.25 44 34.66 10 J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys￾tems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有