正在加载图片...
ARTICLE N PRESS J Pinho Lucas et al Expert Systems with Applications xxx(2011)xox-xocx problem is also referred as the Cold-Start problem"(Guo, 1997)in the WEKA(Waikato Environment for Knowledge Analysis) tool the literature. As extreme case of the early rater problem, when a was used to perform data transformation and pre-processing and collaborative filtering system first begins every user suffers from we applied the 10-fold cross-validation method to estimate algo- the early rater problem for every item( Claypool et al., 1999). rithms classification metrics for all experiments made In the next subsection we describe all datasets employed in this 4. case study, as well as how they were obtained. Subsequently, in Section 5.2, we compare some typical rule-based classifiers against Another drawback occurring just in collaborative filtering some general classifiers, where we take into account their behavior methods is related to the gray Sheep Problem( Claypool et al. associated to the drawbacks depicted in Section 4. In Sections 5.3 1999). Such drawback is associated to users whose preferences and 5.4 we analyze how rule-based algorithms and recommender do not match with the ones of any group of users. As a conse- systems data are related to sparsity(the most critical drawback user who stays long time in the condition of cold-start may be occurrence of false positive errors on three different groups of considered in the condition of gray sheep as well, because such classifiers user has not shown interest on system items. Thus, conceptually the gray sheep problem can be viewed as a special instance of 5.1. Employed datasets Given that in content-based methods preferences of other users In order to analyze sparsity on rule-based classifiers(Section is not considered for building recommendations, the gray sheep posing the Cba(Liu et al. 1998) and CMaR (Li et al. 2001)algo- early rater problem neither occurs in such class of methods, be- rithms, which are general data of different gathered from the UCI ause they may provide recommendation taking into account Machine Learning Repository. However, at this point we just com merely the features of an item. pared the results obtained by authors in each dataset, where we considered the number of attributes the number of records and sider the social context of the user, the system may only be capable the accuracy obtained by classifiers However, for the rest of the experiments performed in this case (Condliff et al, 1999), therefore the user will only get to see items study, we employed real data obtained from two recommender that are similar to those he or she has rated positively As a result, systems: MovieLens and BookCrossing. The data of MovieLens con- lots of false negatives may occur, because such methods are not sists of movies ratings made by users in 2000, which is a recom- lated to the same domain. Moreover in some domains the items available for research purposes. Initially, the MovieLens dataset current technology(such as movies, music, restaurants)(Balaba- by 943 users, but we integrated the data related to users and mov- ovid& Shoham, 1997). In web pages, for instance, only some fea- les into one file, which was the input provided for the algorithms tures of the content can be extracted because current information analyzed in this case study retrieval techniques ignore multimedia features like text embed However, before supplying such input we changed the rating ded in images for example(balabanovic Shoham, 1997). attribute in order to have only two values: "Not recommended (score 1 or 2)and"Recommended"(score 3, 4 or 5). The first one refers to an item the user may be interested in and the second re- 5. Case study fers to the opposite case. Such changes were performed to simplify classification because the main aim in a recommendation task is to In this section we describe a case study that aims at investigat- determine if an item should be offered to the user or not. taking ing algorithms and design conducts that may help to alleviate the into account users ' data, we used the following attributes: gender, lost common and challenging drawbacks inherited to recom- age and occupation. The age attribute was discretized in five age mender systems. To do so, we analyze the behavior of some classi- ranges. The users occupation attribute is a nominal variable with fication algorithms on general data and also on real data gathered 21 distinct values. Taking into account movies data, the file pro- from recommender systems. In order to evaluate such algorithms, vided by MovieLens originally contained 19 binary attributes re- we consider, besides the classic precision rate metric, some impe lated to movie genres. An instance with value 1 expressed that tant metrics in the recommender system context(i.e false posi- the movie belongs to a specific gender and 0 otherwise. The asso- tives rate and number of rules considered for classification). Hill, ciation models consistency would be compromised if 19, among Stead, Rosenstein, and Furnas(1995) suggested that algorithmic the 23 attributes on the dataset, were binaries. Thus, these 19 bin- improvements in collaborative filtering methods may come from ary attributes were reduced to just one attribute representing the different directions than just continued improvements in mean movie genre's name However, since some movies may belong to absolute error m oreover, though the new algorithms often appear different film genres, we only used the records containing ratings to do better than the older algorithms they are compared to, we about movies with just one genre. After data pre-processing and nd that when each algorithm is tuned to its optimum, they all transformation, 14, 587 records were remained in the input file produce similar measures of quality(Herlocker et aL, 2004). That for the algorithms used in this study is why we consider, in addition to numerical metrics, some implicit On the other hand, the database of BookCrossing consists of non numerical features where we assume empirical knowledge book ratings gathered by Ziegler, McNee, Konstan, and Lausen too. Consequently, we are able to analyze recommender tech-(2005)from the BookCrossing community. Users from this niques within different perspectives and not being restricted to community exchange books and experiences all around the world. evaluate methods simply by comparing classic accuracy metrics. Initially, the bookCrossing data contained 433, 671 explicit ratings In order to build experiments performed in this case study, we (an assigned mark from 1 to 10)about 185, 832 books provided considered the 26 datasets tested in the case studies accomplished by 77, 797 users. Such database has 2.33 ratings per item, for this in Liu et al. (1998)and Li et al. 2001)and, we also employed five reason it may be considered much sparser than MovieLens(which datasets obtained from real recommender systems. For all datasets has 59.45 ratings per item). This assumption makes the use of the Please cite this article in press as: Pinho Lucas. ] et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys- tems Expert Systems with Applications(2011). doi: 10.1016/jeswa. 2011.07 136problem is also referred as the ‘‘Cold-Start problem’’ (Guo, 1997) in the literature. As extreme case of the early rater problem, when a collaborative filtering system first begins every user suffers from the early rater problem for every item (Claypool et al., 1999). 4.4. Gray sheep problem Another drawback occurring just in collaborative filtering methods is related to the gray Sheep Problem (Claypool et al., 1999). Such drawback is associated to users whose preferences do not match with the ones of any group of users. As a conse￾quence, these users will not receive any recommendation. In fact, a user who stays long time in the condition of cold-start may be considered in the condition of gray sheep as well, because such user has not shown interest on system items. Thus, conceptually the gray sheep problem can be viewed as a special instance of the early-rater problem. Given that in content-based methods preferences of other users is not considered for building recommendations, the gray sheep problem does not occur in this class of methods. Moreover, the early rater problem neither occurs in such class of methods, be￾cause they may provide recommendation taking into account merely the features of an item. On the other hand, since content-based methods do not con￾sider the social context of the user, the system may only be capable of recommending items that score highly against a user profile (Condliff et al., 1999), therefore, the user will only get to see items that are similar to those he or she has rated positively. As a result, lots of false negatives may occur, because such methods are not able to distinguish between low and high quality information re￾lated to the same domain. Moreover, in some domains the items are not amenable to any useful feature extraction methods with current technology (such as movies, music, restaurants) (Balaba￾novic´ & Shoham, 1997). In web pages, for instance, only some fea￾tures of the content can be extracted, because current information retrieval techniques ignore multimedia features like text embed￾ded in images for example (Balabanovic´ & Shoham, 1997). 5. Case study In this section we describe a case study that aims at investigat￾ing algorithms and design conducts that may help to alleviate the most common and challenging drawbacks inherited to recom￾mender systems. To do so, we analyze the behavior of some classi- fication algorithms on general data and also on real data gathered from recommender systems. In order to evaluate such algorithms, we consider, besides the classic precision rate metric, some impor￾tant metrics in the recommender system context (i.e., false posi￾tives rate and number of rules considered for classification). Hill, Stead, Rosenstein, and Furnas (1995) suggested that algorithmic improvements in collaborative filtering methods may come from different directions than just continued improvements in mean absolute error. Moreover, though the new algorithms often appear to do better than the older algorithms they are compared to, we find that when each algorithm is tuned to its optimum, they all produce similar measures of quality (Herlocker et al., 2004). That is why we consider, in addition to numerical metrics, some implicit non numerical features where we assume empirical knowledge too. Consequently, we are able to analyze recommender tech￾niques within different perspectives and not being restricted to evaluate methods simply by comparing classic accuracy metrics. In order to build experiments performed in this case study, we considered the 26 datasets tested in the case studies accomplished in Liu et al. (1998) and Li et al. 2001) and, we also employed five datasets obtained from real recommender systems. For all datasets the WEKA (Waikato Environment for Knowledge Analysis) tool was used to perform data transformation and pre-processing and we applied the 10-fold cross-validation method to estimate algo￾rithms’ classification metrics for all experiments made. In the next subsection we describe all datasets employed in this case study, as well as how they were obtained. Subsequently, in Section 5.2, we compare some typical rule-based classifiers against some general classifiers, where we take into account their behavior associated to the drawbacks depicted in Section 4. In Sections 5.3 and 5.4 we analyze how rule-based algorithms and recommender systems data are related to sparsity (the most critical drawback of recommender systems). Finally, in Section 5.5 we probe the occurrence of false positive errors on three different groups of classifiers. 5.1. Employed datasets In order to analyze sparsity on rule-based classifiers (Section 5.3), we considered the 26 datasets employed by the works pro￾posing the CBA (Liu et al., 1998) and CMAR (Li et al., 2001) algo￾rithms, which are general data of different gathered from the UCI Machine Learning Repository. However, at this point we just com￾pared the results obtained by authors in each dataset, where we considered the number of attributes, the number of records and the accuracy obtained by classifiers. However, for the rest of the experiments performed in this case study, we employed real data obtained from two recommender systems: MovieLens and BookCrossing. The data of MovieLens con￾sists of movies ratings made by users in 2000, which is a recom￾mender system based on the GroupLens technology, and is freely available for research purposes. Initially, the MovieLens dataset contained approximately 100,000 ratings for 1682 movies made by 943 users, but we integrated the data related to users and mov￾ies into one file, which was the input provided for the algorithms analyzed in this case study. However, before supplying such input we changed the rating attribute in order to have only two values: ‘‘Not recommended’’ (score 1 or 2) and ‘‘Recommended’’ (score 3, 4 or 5). The first one refers to an item the user may be interested in and the second re￾fers to the opposite case. Such changes were performed to simplify classification, because the main aim in a recommendation task is to determine if an item should be offered to the user or not. Taking into account users’ data, we used the following attributes: gender, age and occupation. The age attribute was discretized in five age ranges. The user’s occupation attribute is a nominal variable with 21 distinct values. Taking into account movies’ data, the file pro￾vided by MovieLens originally contained 19 binary attributes re￾lated to movie genres. An instance with value 1 expressed that the movie belongs to a specific gender and 0 otherwise. The asso￾ciation model’s consistency would be compromised if 19, among the 23 attributes on the dataset, were binaries. Thus, these 19 bin￾ary attributes were reduced to just one attribute representing the movie genre’s name. However, since some movies may belong to different film genres, we only used the records containing ratings about movies with just one genre. After data pre-processing and transformation, 14,587 records were remained in the input file for the algorithms used in this study. On the other hand, the database of BookCrossing consists of book ratings gathered by Ziegler, McNee, Konstan, and Lausen (2005) from the BookCrossing community. Users from this community exchange books and experiences all around the world. Initially, the BookCrossing data contained 433,671 explicit ratings (an assigned mark from 1 to 10) about 185,832 books provided by 77,797 users. Such database has 2.33 ratings per item, for this reason it may be considered much sparser than MovieLens (which has 59.45 ratings per item). This assumption makes the use of the 6 J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys￾tems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有