正在加载图片...
ARTICLE N PRESS J Pinho Lucas et aL/ Expert Systems with Applications xxx(2011)xxx-xXx recommendation may be presented as a false negative or a false are 1000 stores, 52 weeks in a year, 500,000 customers and 10,000 positive. The first one consists of products that were not recon products, the dataset has1000×52×10000×500000= mended, though the consumer would have liked them. The second 260,000,000, 000,000 potential cells. However, it might only have one consists of recommended products, though the consumer does 1, 872,000,000 populated cells, because there are 450,000 custom- not like them. According to Sarwar et al. (2000), the false positives ers shopping on average 26 times a year, buying 40 products at just are more critical because they will lead to angry consumers More- 2 stores. So, the dataset is 0.00036% sparse (936,000,000 over, even early researchers recognized that when recommender 260,000, 000,000,000)x 100) approaches are used to support decisions, it can be more valuable to measure how often the system leads its users to wrong choices 4.2. Scalability (Herlocker et al. 2004). Consequently, recommender method hould concern mostly on avoiding false positives Scalability is another drawback of recommender systems re- However, Claypool et al.(1999) have suggested that recom- sulted by the huge number of items available in it. Scalability in mender methods, unlike humans, have difficulty in distinguishing recommender systems includes both very large problem sizes between high-quality and low-quality information relating to the and real-time latency requirements ( Schafer et al, 2001). One same topic. Therefore, it might not be effective to provide recom- example of such requirements may be a scenario in which there mendations when evaluations acquired from users are not taken is a recommender system connected to a Web site needing to pre to account on the other hand the use of information from user vide recommendations within some milliseconds and. at the same evaluations probably provide more false positives. In order to re- time, serve thousands of users simultaneously. Thus, a major chal duce false positives, methods employed in recommender systems lenge in recommender systems nowadays, according to Schafer must avoid the occurrence of some typical drawbacks. Next sub- et al. (2001), is to adapt data mining techniques to meet simulta- sections will describe the four most critical drawbacks that may neously low latency and high throughput(amount of data flowing occur in these systems. in a system) requirements in order to attract and serve a huge number of users. In the recommender system context, the through- 4.1. Data sparsity put may be measured by the number of users and items that the system is able to support without affecting efficiency. Probably the biggest challenge recommender systems have Efficiency is a key feature for recommender systems, because nowadays is related to data sparsity problem due to the huge they need to supply fast feedback to their users. Generally, short- amount of data available in current recommender systems. Funda- comings resulted from scalability do not occur in model based mentally, sparsity occurs because the number of ratings needed to methods, because in these methods, differently from other classes build a prediction model is greater than the number of the ratings of methods, the computer data processing is usually not done dur- obtained from users. Moreover, most recommender techniques re- ing run time. quire user explicit expression of personal preferences for item Scalability may turn into a major concern for the efficiency of a Nevertheless, methods for obtaining ratings implicitly have been system, because some techniques, like searching for the nearest developed in order to add more ratings and reduce sparsity. How- neighbor, for example, may be unfeasible to be employed in sys- ever, even with the use of up to date methods(including data min- tems having a huge data base. a typical web-based recommender ing methods), sparsity still remain a critical drawback for system running only the nearest neighbor algorithm will probably recommender systems due to the extensive number of items avail- suffer serious scalability problems(Sarwar et al., 2001). ble. This is a significant problem because, in practice, it is usually costly and difficult to collect sufficient data for all users(Ahn, Kan 4.3. Early rater problem lee, 2010). According to Sarwar et al. (2001), active users may have purchased less than 1% of the items available in a system. This Despite that the drawbacks described before may be minimized neans that in a recommender system of movies owning 500,000 by means of usage of model-based methods, there are other shor items, for example, an active user would be able to rate 5000 mov- comings that may occur along with these methods. The Early Rater es, nevertheless, we cannot expect that all users of the system (or First-Rater) Problem(Claypool et al., 1999: Condliff et al, 1999 watch 5000 movies and provide ratings to all of them. In addition, is an example of drawback associated with every class of collabo- rating schemes can only be applied to homogeneous domains and rative filtering methods Such drawback arises when it is impossi- the number of ratings eligible to be used by the system is even ble to offer recommendations about an item that was just more restricted incorporated in the system and therefore, has few (or even nor Model based methods reduce shortcomings derived from spar- evaluations from users. In fact, the early rater problem is directly sity, however it is still necessary to have a certain minimum num- linked with sparsity, because when a system has a high numbe ber of ratings in order to build an estimation model of ratings. In of items, probably most of these items have never received any any case, owning ratings of just 1% of systems items may be, evaluation. Conceptually, the early-rater problem can be viewed according to the technique used, scarce to build a reliable model. as a special instance of the sparsity problem( huang, Chen, Zeng In this work we also analyze how the performance of classifiers 2004). may be affected by sparsity. Therefore, we address some questions anwar et al. (2001) affirm that current recommender syst related to sparsity: how we can nominate whether a dataset is depend on the altruism of a set of users who are willing to rate parse or not and if it is possible to measure the degree of sparsity many items without receiving many recommendations. Econo- of a dataset. Due to practical reasons sometimes industry and acad- mists have speculated that even if rating required no effort at all emy, evaluate sparsity considering the number of NULL NA values many users would choose to delay considering items to wait for presented by a certain dataset. Hence sparsity may be seen as den their neighbors to provide them with recommendations( Avery sity, which reflects both the overall size of recommenders item Zeckhauser, 1997). Thus, it is needed to find a way to encourage pace and the degree in which users have explored it(Herlocker users to made evaluations about items available in the system. etal.2004) Analogously, such drawback also occurs with a l In this context, in Rittman(2005) an example about a dataset the system, because since there is no information about him, it with four attributes in a retail market scenario is described. The would be impossible to determine his behavior in order to pre ttributes are: store, week in a year, costumer and product. If there him recommendations. Actually this scenario of the early rater Please cite this article in press as: Pinho Lucas, J, et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sy tems.Expert Systems with Applications(2011). doi:10.1016/jeswa2011.07.136recommendation may be presented as a false negative or a false positive. The first one consists of products that were not recom￾mended, though the consumer would have liked them. The second one consists of recommended products, though the consumer does not like them. According to Sarwar et al. (2000), the false positives are more critical because they will lead to angry consumers. More￾over, even early researchers recognized that when recommender approaches are used to support decisions, it can be more valuable to measure how often the system leads its users to wrong choices (Herlocker et al., 2004). Consequently, recommender methods should concern mostly on avoiding false positives. However, Claypool et al. (1999) have suggested that recom￾mender methods, unlike humans, have difficulty in distinguishing between high-quality and low-quality information relating to the same topic. Therefore, it might not be effective to provide recom￾mendations when evaluations acquired from users are not taken into account. On the other hand, the use of information from user evaluations probably provide more false positives. In order to re￾duce false positives, methods employed in recommender systems must avoid the occurrence of some typical drawbacks. Next sub￾sections will describe the four most critical drawbacks that may occur in these systems. 4.1. Data sparsity Probably the biggest challenge recommender systems have nowadays is related to data sparsity problem due to the huge amount of data available in current recommender systems. Funda￾mentally, sparsity occurs because the number of ratings needed to build a prediction model is greater than the number of the ratings obtained from users. Moreover, most recommender techniques re￾quire user explicit expression of personal preferences for items. Nevertheless, methods for obtaining ratings implicitly have been developed in order to add more ratings and reduce sparsity. How￾ever, even with the use of up to date methods (including data min￾ing methods), sparsity still remain a critical drawback for recommender systems due to the extensive number of items avail￾able. This is a significant problem because, in practice, it is usually costly and difficult to collect sufficient data for all users (Ahn, Kang, & Lee, 2010). According to Sarwar et al. (2001), active users may have purchased less than 1% of the items available in a system. This means that in a recommender system of movies owning 500,000 items, for example, an active user would be able to rate 5000 mov￾ies, nevertheless, we cannot expect that all users of the system watch 5000 movies and provide ratings to all of them. In addition, rating schemes can only be applied to homogeneous domains and the number of ratings eligible to be used by the system is even more restricted. Model based methods reduce shortcomings derived from spar￾sity, however it is still necessary to have a certain minimum num￾ber of ratings in order to build an estimation model of ratings. In any case, owning ratings of just 1% of system’s items may be, according to the technique used, scarce to build a reliable model. In this work we also analyze how the performance of classifiers may be affected by sparsity. Therefore, we address some questions related to sparsity: how we can nominate whether a dataset is sparse or not and if it is possible to measure the degree of sparsity of a dataset. Due to practical reasons sometimes industry and acad￾emy, evaluate sparsity considering the number of NULL/NA values presented by a certain dataset. Hence, sparsity may be seen as den￾sity, which reflects both the overall size of recommender’s item space and the degree in which users have explored it (Herlocker et al., 2004). In this context, in Rittman (2005) an example about a dataset with four attributes in a retail market scenario is described. The attributes are: store, week in a year, costumer and product. If there are 1000 stores, 52 weeks in a year, 500,000 customers and 10,000 products, the dataset has 1000 52 10,000 500,000 = 260,000,000,000,000 potential cells. However, it might only have 1,872,000,000 populated cells, because there are 450,000 custom￾ers shopping on average 26 times a year, buying 40 products at just 2 stores. So, the dataset is 0.00036% sparse ((936,000,000/ 260,000,000,000,000) 100). 4.2. Scalability Scalability is another drawback of recommender systems re￾sulted by the huge number of items available in it. Scalability in recommender systems includes both very large problem sizes and real-time latency requirements (Schafer et al., 2001). One example of such requirements may be a scenario in which there is a recommender system connected to a Web site needing to pro￾vide recommendations within some milliseconds and, at the same time, serve thousands of users simultaneously. Thus, a major chal￾lenge in recommender systems nowadays, according to Schafer et al. (2001), is to adapt data mining techniques to meet simulta￾neously low latency and high throughput (amount of data flowing in a system) requirements in order to attract and serve a huge number of users. In the recommender system context, the through￾put may be measured by the number of users and items that the system is able to support without affecting efficiency. Efficiency is a key feature for recommender systems, because they need to supply fast feedback to their users. Generally, short￾comings resulted from scalability do not occur in model based methods, because in these methods, differently from other classes of methods, the computer data processing is usually not done dur￾ing run time. Scalability may turn into a major concern for the efficiency of a system, because some techniques, like searching for the nearest neighbor, for example, may be unfeasible to be employed in sys￾tems having a huge data base. A typical web-based recommender system running only the nearest neighbor algorithm will probably suffer serious scalability problems (Sarwar et al., 2001). 4.3. Early rater problem Despite that the drawbacks described before may be minimized by means of usage of model-based methods, there are other short￾comings that may occur along with these methods. The Early Rater (or First-Rater) Problem (Claypool et al., 1999; Condliff et al., 1999) is an example of drawback associated with every class of collabo￾rative filtering methods. Such drawback arises when it is impossi￾ble to offer recommendations about an item that was just incorporated in the system and, therefore, has few (or even none) evaluations from users. In fact, the early rater problem is directly linked with sparsity, because when a system has a high number of items, probably most of these items have never received any evaluation. Conceptually, the early-rater problem can be viewed as a special instance of the sparsity problem (Huang, Chen, & Zeng, 2004). Sarwar et al. (2001) affirm that current recommender systems depend on the altruism of a set of users who are willing to rate many items without receiving many recommendations. Econo￾mists have speculated that even if rating required no effort at all, many users would choose to delay considering items to wait for their neighbors to provide them with recommendations (Avery & Zeckhauser, 1997). Thus, it is needed to find a way to encourage users to made evaluations about items available in the system. Analogously, such drawback also occurs with a new user joining the system, because since there is no information about him, it would be impossible to determine his behavior in order to provide him recommendations. Actually, this scenario of the early rater J. Pinho Lucas et al. / Expert Systems with Applications xxx (2011) xxx–xxx 5 Please cite this article in press as: Pinho Lucas, J., et al. Making use of associative classifiers in order to alleviate typical drawbacks in recommender sys￾tems. Expert Systems with Applications (2011), doi:10.1016/j.eswa.2011.07.136
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有