正在加载图片...
rman, and Kadie(1998) compare the predictive accuracy of various methods in a set of representa- ier, Meyer, and Boulle(2007)and main n, 2007 review the ma a CF frame predic der to be able me at differ- orovided to help ng ideas in each group ofRegardless of the method used in the CF stage, the technical aim generally pursued is to minimize the prediction errors, by making the accuracy (Fuyuki, Quan, & Shinichi, 2006; Giaglis & Lekakos, 2006; Li & Yamada, 2004; Manolopoulus, Nanopoulus, Papadopou￾lus, & Symeonidis, 2007; Su & Khoshgoftaar, 2009) of the RS as high as possible; nevertheless, there are other purposes that need to be taken into account: avoid overspecialization phenomena, find good items, trust of recommendations, novelty, precision and recall measures, sparsity, cold start issues, etc. The framework proposed in the paper gives special importance to the quality of the predictions and the recommendations, as well as to the novelty and trust results. Whilst the importance of the quality obtained in the predictions and recommendations has been studied in detail since the start of the RS, the quality results in nov￾elty and trust provided by the different methods and metrics used in CF have not been evaluated in depth. Measuring the quality of the trust results in recommendations becomes even more complicated as we are entering a particularly subjective field, where each specific user can grant more or less importance to various aspects that are selected as relevant to gain their trust in the recommendations offered (recommendation of recent elements, such as film premieres, introduction of novel ele￾ments, etc.). Another additional problem is the number of nuances that can be taken into account together with the lack of consensus to define them; in this way we can find studies on trust, reputation, credibility, importance, expertise, competence, reliability, etc. which sometimes pursue the same objective and other times do not. In Buhwan, Jaewook, and Hyunbo (2009) we can see some novel memory-based methods that incorporate the level of a user credit instead of using similarity between users. In Kwiseok, Jinhyung, and Yongtae (2009) they employ a multidimensional credibility model, source credibility from consumer psychology, and provide a credible neighbor selection method, although the equations in￾volved require a great number of parameters of difficult or arbi￾trary adjustment. O’Donovan and Smyth (2005) presents two computational models of trust and show how they can be readily incorporated into CF frameworks. Kitisin and Neuman (2006) pro￾pose an approach to include the social factors e.g. user’s past behaviors and reputation together as an element of trust that can be incorporated into the RS. Zhang (2008) and Hijikata et al., 2009 tackle the novelty issue: in the first paper they propose a no￾vel topic diversity metric which explores hierarchical domain knowledge, whilst in the second paper they infer items that a user does not know by calculating the similarity of users or items based on information about what items users already know. An aspect related to the trust measures is the capacity to provide justifica￾tions for the recommendations made; in Symeonidis et al. (2008) they propose an approach that attains both accurate and justifiable recommendations, constructing a feature profile for the users to re￾veal their favorite features. To date, various publications have been written which tackle the way the RS are evaluated, among the most significant we have Herlocker, Konstan, Riedl, and Terveen (2004) which reviews the key decisions in evaluating CF RS: the user tasks, the type of analysis and datasets being used, the ways in which prediction quality is measured and the user-based evaluation of the system as a whole. Hernández and Gaudioso (2008) is a current study which proposes a recommendation filtering process based on the distinction between interactive and non-interactive subsys￾tems. General publications and reviews also exist which include the most commonly accepted metrics, aggregation approaches and evaluation measures: mean absolute error, coverage, preci￾sion, recall and derivatives of these: mean squared error, normal￾ized mean absolute error, ROC and fallout; Goldberg, Roeder, Gupta, and Perkins (2001) focus on the aspects not related to the evaluation, Breese, Heckerman, and Kadie (1998) compare the predictive accuracy of various methods in a set of representa￾tive problem domains. Candillier, Meyer, and Boullé (2007) and Schafer, Frankowski, Herlocker, and Sen, 2007 review the main CF methods proposed in the literature. Among the most significant papers that propose a CF frame￾work is Herlocker, Konstan, Borchers, and Riedl (1999) which eval￾uates the following: similarity weight, significance weighting, variance weighting, selecting neighborhood and rating normaliza￾tion; Hernández and Gaudioso (2008) propose a framework in which any RS is formed by two different subsystems, one of them to guide the user and the other to provide useful/interesting items. Koutrika, Bercovitz, and Garcia (2009) is a recent and very interest￾ing framework which introduces levels of abstraction in CF process, making the modifications in the RS more flexible. The RS frameworks proposed until now present two deficien￾cies which we aim to tackle in this paper. The first of these is the lack of formalization in the evaluation methods; although the qual￾ity metrics are well defined, there are a variety of details in the implementation of the methods which, in the event they are not specified, can lead to the generation of different results in similar experiments. The second deficiency is the absence of quality mea￾sures of the results in aspects such as novelty and trust of the recommendations. The following section of this paper develops a complete series of mathematical formalizations based on sets theory, backed by a running example which aids understanding and by cases of studies which show clarifying results of the aspects and alternatives shown; in this section, we also obtain the combination of metric, aggregation approach and standardization method which provides the best results, enabling it to be used as a reference to evaluate metrics designed by the scientific community. In Section 3 we specify the evaluation measures proposed in the framework, which include the quality analysis of the following aspects: predictions (estimations), recommendations, novelty and trust; this same sec￾tion shows the results obtained by using MovieLens 1M and NetF￾lix. Finally, we set our most relevant conclusions. 2. Framework specifications This section provides both the equations on which the predic￾tion/recommendation process in the CF stage is based and the equa￾tions that support the quality evaluation process offered in the proposed framework; between these last two we have the tradi￾tional MAE, coverage, precision, recall and those developed specif￾ically to complete the framework: novelty-precision, novelty-recall, trust-precision, trust-recall. The objective of formalizing the prediction, recommendation and evaluation processes is to ensure that the experiments carried out by different researchers can be reproduced and are not altered by different decisions made on behalf of different implementation details: e.g. deciding how to act when no k-neighborhoods have voted for a specific item (we could say not predict, or predict with the average votes of all users on that item), whether we apply a standardization process to the input data or to the weightings of the aggregation approach, whether on finding an error in a predic￾tion we take the decimal values of the prediction or round them off to the nearest whole value, etc. The formalization presented here is fundamental when specify￾ing a framework, where the same experiments carried out by dif￾ferent researchers must give the same results, in order to be able to compare the metrics and methods developed over time at differ￾ent research centers. Throughout the section, a running example is provided to help to understand and follow the underlying ideas in each group of 14610 J. Bobadilla et al. / Expert Systems with Applications 38 (2011) 14609–14623
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有