正在加载图片...
ARTICLE N PRESS recommender systems, because we want to investigate methods best results that are capable of alleviating effects of sparsity in recommender Waveform systems. Furthermore, taking into account results of the experi- ments made in the previous subsection, in the experiments de scribed subseque ve considered specifically associative classifiers(CBa and CMAr). In this way. we are able to depict a more accurate and less redundant case study Records/ 5000 16=3.5616=63118=82221=238.1 Before performing the experiments, we analyzed how sparsity features of the five datasets acquired from MovieLens and Book Crossing, which were described in Section 5. 1. To do so, we also considered the approach taken in Rittman(2005), which was de- hrough the results shown on Table 3, we can conclud scribed in detail in Section 4. 1. At this point, we also take into ac non-associative classifiers are much more susceptible to count the number of distinct values of the attributes because the than CBA. because the four datasets in which metric nvestigated datasets have similar features. the greatest values are quite sparse(the records attributes correla- Hence, we first take into account the Movielens dataset(with tion is, excepting the glass dataset, lower than 10) 14, 587 records)and calculate the product of distinct values of its Analogously, we also analyzed the four datasets in which CMAR attributes: 2x2x 7 x 21 x 14=17,052. For BookCrossing. the presented the greatest variation of accuracy (i.e, metric) when product of distinct values both"Country " and"State"datasets is compared to C4.5 and Ripper. Table 4 shows such results. 144000(2×40×5×40×9) and the product of distinct values Results revealed that the non-associative classifiers are also for the reduced datasets is 9000(2 x 10 x 5 x 10 x 9). Table 7 much more susceptible to sparsity than CMAR, because the first shows the density correlation ( considering the approach taken in three datasets are quite sparse. However CBa appeared to be less Rittman(2005))of the datasets used within the experiments made usceptible to sparsity than CmAr, because the values of the corre in this subsection. lation records/attributes are greater on the results of CMAr As shown on Table 7, the"BCrossing World"and"CRossing Waveform dataset, for example, is quite dense) USA "are the sparsest datasets. Not surprisingly, the two datasets Conversely, we also compared C4.5 and ripper to the associa- with reduced number of attributes are less sparse. Conversely tive classifiers. Table 5 shows the best results of C4.5, where met- the MovieLens is the densest dataset used in this study(0.86). rics is the difference of accuracy between C4.5 and CBa plus the For this reason, the movieLens dataset is used in mostly recom- difference of accuracy between C4.5 and CMAR mender systems' works, because its density make easier to develop Results shown in Table 5 revealed that the datasets in which trustful case studies C4.5 reached its greatest accuracies are very dense(excepting the The datasets shown in Table 7 were provided as input to CBa Auto dataset). Analogously, the results in Table 6 (Rippers best re- and CMAR algorithms, in which we set a confidence threshold va- sults) have shown than Ripper presented similar results to C4.5. lue of 75%. The support threshold values were set empirically Thus, results suggest that non-associative classifiers are according to each dataset and algorithm in a way we could obtain more susceptible to sparsity than associative classifiers. M at least 10 rules on the classification model in table 4 we show the the values of the variation metric in C4.5 and ripper are results obtained by CBA, where we depict the number of rules built lower than the ones of CBA and CMAR, which reveals that there for the classification model. Analogously, in Table 5 we show the results obtained by the Cmar classifier 5.4. Analyzing sparsity on recommender systems data for MovieLens(the densest dataset ) where CMAR presented better accuracy(75.63%) and the greatest support threshold within its re- sults. However, for sparser datasets, CMAR presented worse results, In this subsection, we continue analyzing the effects caused by because. as shown in Table 7. the datasets of countries are span sparsity. However, at this point we are concerned specifically on than the datasets of states of USA and, as shown in Table 9, there was a great loss of accuracy for"BCrossing World"compared with e 5 BCrossing USA"and for"CRossing World 10"compared with C4.5's best results BCrossing USA 10". On the other hand, as shown in Table 8, CBA Vehicle Auto Led did not present a great loss (less than 1%)of accuracy for the data sets of countries C45 726% 73.5% 729% 71.9% A similar scenario occurred for the datasets with 40 distinct va- CMAR 5.1% ues compared with the ones with 10 distinct values, to which CMAR presented a loss of accuracy. Such datasets own substantial 3200 less records(around 75%)than the ones with 40 distinct values, 18=47 25=8.2 8=96 7=457.1 which means they are less likely to present frequent itemsets and to identify relationships to build rules. Conversely, CBa did not lose accuracy in these datasets. able 6 Rippers best results. Austral Hypo Dataset's density 848% 87.3% Number of records/product of distinct values 0.9% Crossing worle 926/144.000=0062 attributes22=16.714=49.329=96.525=126.5 Movielen 14587/17.052=0.86 Please cite this article in press as: Pinho Lucas, J, et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sy tems.Expert Systems with Applications(2011). doi:10.1016/jeswa2011.07.136Through the results shown on Table 3, we can conclude that the non-associative classifiers are much more susceptible to sparsity than CBA, because the four datasets in which metric1 presented the greatest values are quite sparse (the records/attributes correla￾tion is, excepting the Glass dataset, lower than 10). Analogously, we also analyzed the four datasets in which CMAR presented the greatest variation of accuracy (i.e., metric2) when compared to C4.5 and Ripper. Table 4 shows such results. Results revealed that the non-associative classifiers are also much more susceptible to sparsity than CMAR, because the first three datasets are quite sparse. However, CBA appeared to be less susceptible to sparsity than CMAR, because the values of the corre￾lation records/attributes are greater on the results of CMAR (the Waveform dataset, for example, is quite dense). Conversely, we also compared C4.5 and Ripper to the associa￾tive classifiers. Table 5 shows the best results of C4.5, where met￾ric3 is the difference of accuracy between C4.5 and CBA plus the difference of accuracy between C4.5 and CMAR. Results shown in Table 5 revealed that the datasets in which C4.5 reached its greatest accuracies are very dense (excepting the Auto dataset). Analogously, the results in Table 6 (Ripper’s best re￾sults) have shown than Ripper presented similar results to C4.5. Thus, results suggest that non-associative classifiers are much more susceptible to sparsity than associative classifiers. Moreover, the values of the variation metric in C4.5 and Ripper are notably lower than the ones of CBA and CMAR, which reveals that there is lower. 5.4. Analyzing sparsity on recommender systems data In this subsection, we continue analyzing the effects caused by sparsity. However, at this point we are concerned specifically on recommender systems, because we want to investigate methods that are capable of alleviating effects of sparsity in recommender systems. Furthermore, taking into account results of the experi￾ments made in the previous subsection, in the experiments de￾scribed subsequently we considered specifically associative classifiers (CBA and CMAR). In this way, we are able to depict a more accurate and less redundant case study. Before performing the experiments, we analyzed how sparsity features of the five datasets acquired from MovieLens and Book Crossing, which were described in Section 5.1. To do so, we also considered the approach taken in Rittman (2005), which was de￾scribed in detail in Section 4.1. At this point, we also take into ac￾count the number of distinct values of the attributes, because the investigated datasets have similar features. Hence, we first take into account the Movielens dataset (with 14,587 records) and calculate the product of distinct values of its attributes: 2 2 7 21 14 = 17,052. For BookCrossing, the product of distinct values both ‘‘Country’’ and ‘‘State’’ datasets is 144,000 (2 40 5 40 9) and the product of distinct values for the reduced datasets is 9000 (2 10 5 10 9). Table 7 shows the density correlation (considering the approach taken in Rittman (2005)) of the datasets used within the experiments made in this subsection. As shown on Table 7, the ‘‘BCrossing World’’ and ‘‘BCrossing USA’’ are the sparsest datasets. Not surprisingly, the two datasets with reduced number of attributes are less sparse. Conversely, the MovieLens is the densest dataset used in this study (0.86). For this reason, the MovieLens dataset is used in mostly recom￾mender systems’ works, because its density make easier to develop trustful case studies. The datasets shown in Table 7 were provided as input to CBA and CMAR algorithms, in which we set a confidence threshold va￾lue of 75%. The support threshold values were set empirically according to each dataset and algorithm in a way we could obtain at least 10 rules on the classification model. In Table 4 we show the results obtained by CBA, where we depict the number of rules built for the classification model. Analogously, in Table 5 we show the results obtained by the CMAR classifier. Results revealed associative classifiers reached similar accuracy for MovieLens (the densest dataset), where CMAR presented better accuracy (75.63%) and the greatest support threshold within its re￾sults. However, for sparser datasets, CMAR presented worse results, because, as shown in Table 7, the datasets of countries are sparser than the datasets of states of USA and, as shown in Table 9, there was a great loss of accuracy for ‘‘BCrossing World’’ compared with ‘‘BCrossing USA’’ and for ‘‘BCrossing World 10’’ compared with ‘‘BCrossing USA 10’’. On the other hand, as shown in Table 8, CBA did not present a great loss (less than 1%) of accuracy for the data￾sets of countries. A similar scenario occurred for the datasets with 40 distinct val￾ues compared with the ones with 10 distinct values, to which CMAR presented a loss of accuracy. Such datasets own substantial less records (around 75%) than the ones with 40 distinct values, which means they are less likely to present frequent itemsets and to identify relationships to build rules. Conversely, CBA did not lose accuracy in these datasets. Table 4 CMAR’s best results. Labor Zoo Lymph Waveform CMAR 89.7% 97.1% 83.1% 79.4% C4.5 79.3% 92.2% 73.5% 70.2% Ripper 84.0% 88.1% 79.0% 78.4% metric2 16.1% 13.9% 13.7% 12.3% Records/ attributes 57/ 16 = 3.56 101/ 16 = 6.31 148/ 18 = 8.22 5000/ 21 = 238.1 Table 5 C4.5’s best results. Vehicle Auto Pima Led7 C4.5 72.6% 80.1% 75.5% 73.5% CBA 68.7% 78.3% 72.9% 71.9% CMAR 68.8% 78.1% 75.1% 72.5% metric3 7.7% 3.8% 3.0% 2.6% Records/ attributes 846/ 18 = 47 205/ 25 = 8.2 768/ 8 = 96 3200/ 7 = 457.1 Table 6 Ripper’s best results. Horse Austral Sick Hypo Ripper 84.8% 87.3% 97.7% 98.9% CBA 82.1% 84.9% 97.0% 98.9% CMAR 82.6% 86.1% 97.5% 98.4% Variation 4.9% 3.6% 0.9% 0.5% Records/ attributes 368/ 22 = 16.7 690/ 14 = 49.3 2800/ 29 = 96.5 3163/ 25 = 126.5 Table 7 Dataset’s density. Dataset Number of records/product of distinct values BCrossing USA 25,523/144,000 = 0.18 BCrossing USA 10 6270/9000 = 0.7 BCrossing World 8926/144,000 = 0.062 BCrossing World 10 3,228 / 9000 = 0.36 MovieLens 14,587/17,052 = 0.86 J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx 9 Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys￾tems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有