正在加载图片...
ARTICLE N PRESS J. Pinho Lucas et al. Expert Systems with Applications xxx(2011)x0x-xoXx " states of USA", we noticed a slight loss of precision, in most sce- and attributes. Such correlation was built taking into account the narios, on the dataset of world countries. Such loss is reasonable approach shown in Rittman(2005 ) which was described in detail due to the diversification of characteristics that readers from in the Section 4. 1. A high value of such correlation suggests a dense ferent world countries(from four continents) may have, which (non-sparse)dataset(high number of records compared with the are much more substantial than people from just one number of distinct values of its attributes ) because it is easier to Comparing datasets having 40 distinct values (on C have frequent itemsets if there are more records available. and Author attributes) with datasets having 10 distinct In Table 2 we analyze the performance of C4.5, Ripper, CBA and values, we noticed that there was not a significant difference of CMAR on sparse datasets. to do so, we selected (among the 26 precision among them with the values set confidence and support datasets analyzed). the four sparsest datasets(Sonar, Labor, Zoo threshold and Hepatic) for running the algorithms. The first four lines on At last, the CPar algorithm also presented acceptable results, the table depict the accuracy obtained by the classifiers and the even though its precision was slightly lower than ones of other last line depicts the correlation between the number of records lassifiers. Such algorithm is more effective for scenarios of very and attributes. large datasets where processing time may be a critical issue, be- Table 2 shows that the associative classifiers, CBa and Cmar cause the classifier construction and the rules induction are made obtained better results than the other rule-based classifiers(C4.5 in just one processing step. However, in the context of recom- and Ripper ), because on the four datasets they reached the highest lender systems, the response time to the user is not a critical is- accuracy (on bold )and excepting on the "Sonar dataset, both reached greater accuracy than C4. 5 and Ripper. Through these experiments we were able to conclude that clas- In order to compare the associative classifiers to the others sification based on association methods can be employed effec- algorithms, we show, on every subsequent table, the four datasets tively in recommender systems like other methods are, because corresponding to the best results reached by each algorithm. For they can reach similar, or even greater, precision to traditional each of the four algorithms analyzed we show the four datasets classifiers, and also because rules obtained by the classification corresponding to its best results when compared to other algo- model are feasible due to the high value of confidence they present. rithms. In this way, we may know which algorithms are more sus- Since the estimation model is built off-line, recommendations ceptible to sparsity, because we can analyze the density of the may be provided on real time without significant processing efforts dataset in which every algorithm got the best results and scalability, which is usually a challenging feature for recom Firstly, we verified the best results reached by CBa in compari- mender systems, is not a major design concern like for on-line rec- son to the non-associative classifiers(C4.5 and Ripper). Such com- ommender methods Moreover, as rule-based classifiers(especial parison was done through a variation metric considering the associative classifiers)are composed of trivial classification rules, difference of accuracy between CBa and C4.5 plus the difference an estimation model would be easy to be interpreted. It allows of accuracy between CBA and Ripper. Subsequently, we specify a the analyst to interfere and interpret easily the classification model general formula for such metric. generated. Another advantage of a rule-based recommender model is due to it is not static, since it is not needed to build a model for every user(there are general rules for all users ). This way, the gray Metric=(accuracy of algorithm, -accuracy of algorithm2) sheep problem would be less likely to occur using associative clas- (accuracy of algorithm, -accuracy of algorithm) sifiers as well Subsequently, we are going to verify by means of several data- Taking into account the first column of Table 3, for example, w sets with different features and from different domains. how much can check that the accuracy of CBA is 96.8%, of C4.5 is 92. 2% and of rule-based classifiers are vulnerable to sparsity In this way we Ripper is 88.1%. Consequently, the variation metric in this case is will be able to verify if there are even more advantages for allevi- 13.3%(4.6%+8.7%) Table 3 shows the best results of CBA, where ting typical shortcomings, on employing rule-based classifiers oI the first three lines depict the accuracy obtained by the classifier recommender systems. and the last two lines depict, respectively, the variation metric de- fined previously and the correlation between the number of re- 5.3. Analyzing sparsity on rule-based classifiers cords and attributes In this section we describe a case study that evaluates how Table 2 sparsity affects some rule-based classifiers accuracy. Therefore, Algorithm's precision on the sparsest datasets. we took into account the case studies accomplished in CBas(liu et al., 1998). CPAr's (Yin Han, 2003)and CMAr's (Li et al Sonar Labor 001)papers. Such papers compared the accuracy rate obtained C4.5 by their algorithms against Ripper(a traditional rule based classi- 77.5 81.8% fier)and C4.5 algorithms For performing their experiments run- CMAR 79.4% 89.7% 97.1% 80.6% g such algorithms, they employed twenty-six datasets, which are general data of different domains gathered from the UCI Ma- 208160=34757/16=356101/16=631155/19=8.16 tributes chine Learning Repository. In such analysis, for evaluating associa- tive classifiers, we took into account CBa and CMAr, and for evaluating rule-based algorithms we considered the same alge rithms(C4.5 and Ripper)analyzed in the case studies accomplished cBas best results in CBAs, CPAR's and CMAr's papers we did not consider the CPAr algorithm for evaluating sparsity because, as highlighted in the Glass Labor Sonar revious subsection, the other two associative classifiers(CBa CBA and CMAr)are more convenient to be employed in recommender C4.5 3.3% In order to consider sparsity, in this analysis we considered a neasure based in a relationship between the number of records Records attributes 101/16=6312149=23.7857/16=3.56208/60=3.47 Please cite this article in press as: Pinho Lucas. ] et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys- tems Expert Systems with Applications(2011). doi: 10.1016/jeswa. 2011.07 136‘‘states of USA’’, we noticed a slight loss of precision, in most sce￾narios, on the dataset of world countries. Such loss is reasonable due to the diversification of characteristics that readers from dif￾ferent world countries (from four continents) may have, which probably are much more substantial than people from just one country. Comparing datasets having 40 distinct values (on Coun￾try/State and Author attributes) with datasets having 10 distinct values, we noticed that there was not a significant difference of precision among them with the values set confidence and support threshold. At last, the CPAR algorithm also presented acceptable results, even though its precision was slightly lower than ones of other classifiers. Such algorithm is more effective for scenarios of very large datasets where processing time may be a critical issue, be￾cause the classifier construction and the rules induction are made in just one processing step. However, in the context of recom￾mender systems, the response time to the user is not a critical is￾sue, because the recommender model is built off-line. Through these experiments we were able to conclude that clas￾sification based on association methods can be employed effec￾tively in recommender systems like other methods are, because they can reach similar, or even greater, precision to traditional classifiers, and also because rules obtained by the classification model are feasible due to the high value of confidence they present. Since the estimation model is built off-line, recommendations may be provided on real time without significant processing efforts and scalability, which is usually a challenging feature for recom￾mender systems, is not a major design concern like for on-line rec￾ommender methods. Moreover, as rule-based classifiers (especial associative classifiers) are composed of trivial classification rules, an estimation model would be easy to be interpreted. It allows the analyst to interfere and interpret easily the classification model generated. Another advantage of a rule-based recommender model is due to it is not static, since it is not needed to build a model for every user (there are general rules for all users). This way, the gray sheep problem would be less likely to occur using associative clas￾sifiers as well. Subsequently, we are going to verify, by means of several data￾sets with different features and from different domains, how much rule-based classifiers are vulnerable to sparsity. In this way, we will be able to verify if there are even more advantages, for allevi￾ating typical shortcomings, on employing rule-based classifiers on recommender systems. 5.3. Analyzing sparsity on rule-based classifiers In this section we describe a case study that evaluates how sparsity affects some rule-based classifiers accuracy. Therefore, we took into account the case studies accomplished in CBA’s (Liu et al., 1998), CPAR’s (Yin & Han, 2003) and CMAR’s (Li et al., 2001) papers. Such papers compared the accuracy rate obtained by their algorithms against Ripper (a traditional rule based classi- fier) and C4.5 algorithms. For performing their experiments run￾ning such algorithms, they employed twenty-six datasets, which are general data of different domains gathered from the UCI Ma￾chine Learning Repository. In such analysis, for evaluating associa￾tive classifiers, we took into account CBA and CMAR, and for evaluating rule-based algorithms we considered the same algo￾rithms (C4.5 and Ripper) analyzed in the case studies accomplished in CBA’s, CPAR’s and CMAR’s papers. We did not consider the CPAR algorithm for evaluating sparsity because, as highlighted in the previous subsection, the other two associative classifiers (CBA and CMAR) are more convenient to be employed in recommender systems. In order to consider sparsity, in this analysis we considered a measure based in a relationship between the number of records and attributes. Such correlation was built taking into account the approach shown in Rittman (2005), which was described in detail in the Section 4.1. A high value of such correlation suggests a dense (non-sparse) dataset (high number of records compared with the number of distinct values of its attributes), because it is easier to have frequent itemsets if there are more records available. In Table 2 we analyze the performance of C4.5, Ripper, CBA and CMAR on sparse datasets. To do so, we selected, (among the 26 datasets analyzed), the four sparsest datasets (Sonar, Labor, Zoo and Hepatic) for running the algorithms. The first four lines on the table depict the accuracy obtained by the classifiers and the last line depicts the correlation between the number of records and attributes. Table 2 shows that the associative classifiers, CBA and CMAR, obtained better results than the other rule-based classifiers (C4.5 and Ripper), because on the four datasets they reached the highest accuracy (on bold) and excepting on the ‘‘Sonar dataset’’, both reached greater accuracy than C4.5 and Ripper. In order to compare the associative classifiers to the others algorithms, we show, on every subsequent table, the four datasets corresponding to the best results reached by each algorithm. For each of the four algorithms analyzed, we show the four datasets corresponding to its best results when compared to other algo￾rithms. In this way, we may know which algorithms are more sus￾ceptible to sparsity, because we can analyze the density of the dataset in which every algorithm got the best results. Firstly, we verified the best results reached by CBA in compari￾son to the non-associative classifiers (C4.5 and Ripper). Such com￾parison was done through a variation metric considering the difference of accuracy between CBA and C4.5 plus the difference of accuracy between CBA and Ripper. Subsequently, we specify a general formula for such metric. Metrici ¼ ðaccuracy of algorithm1 accuracy of algorithm2Þ þ ðaccuracy of algorithm1 accuracy of algorithm3Þ Taking into account the first column of Table 3, for example, we can check that the accuracy of CBA is 96.8%, of C4.5 is 92.2% and of Ripper is 88.1%. Consequently, the variation metric in this case is 13.3% (4.6% + 8.7%). Table 3 shows the best results of CBA, where the first three lines depict the accuracy obtained by the classifier and the last two lines depict, respectively, the variation metric de- fined previously and the correlation between the number of re￾cords and attributes. Table 2 Algorithm’s precision on the sparsest datasets. Sonar Labor Zoo Hepatic C4.5 70.2% 79.3% 92.2% 80.5% Ripper 78.4% 84.0% 88.1% 76.7% CBA 77.5% 86.3% 96.8% 81.8% CMAR 79.4% 89.7% 97.1% 80.6% Records/ attributes 208/60 = 3.47 57/16 = 3.56 101/16 = 6.31 155/19 = 8.16 Table 3 CBA’s best results. Zoo Glass Labor Sonar CBA 96.8% 73.9% 86.3% 77.5% C4.5 92.2% 68.7% 79.3% 70.2% Ripper 88.1% 69.1% 84.0% 78.4% metric1 13.3% 10.0% 9.3% 6.4% Records/attributes 101/16 = 6.31 214/9 = 23.78 57/16 = 3.56 208/60 = 3.47 8 J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys￾tems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有