RECOMMENDER SYSTEMS multicriteria User Modeling in Recommender Systems Kleanthi Lakiotaki and Nikolaos F. Matsatsinis, Technical University of Crete Alexis Tsoukias, Universite Paris Dauphine R recommender systems are software applications that attempt to reduce information overload by recommending items of interest to end users a bybrid based on their preferences, possibly giving movie, song, or other product sug- recommender gestions. In this sense, an accurate recommender system ideally will be able to systems framework act on the user's behalf I To achieve this goal, We form these profiles with a specially de it must gain knowledge of the user's value signed disaggregation-aggregation approach creates user- system and decision policy. Most existing that uses the UTA"(Utilites Additives recommender systems use a collaborative- algorithm profile groups filtering approach, some are based on a We demonstrate this proposed method content-based approach, and many have at- ology as a movie recommender system and before applying tempted to combine these two methods into test its performance with real user data hybrid frameworks. (See related research Furthermore, through a comparison study a collaborative for representative review of recommender systems.) collaborative-filtering methodologies, we filtering algorithm Multiple-criteria decision analysis (MCDA) clearly prove that creating user profile is a well-established field of decision sci- groups according to the proposed metho by incorporating techniques from n the support them in the decision-making tion process tegral part of a recommenda- ence that aims at analyzing and mod ology is an eling decision makers' value systems to process. This article presents a hybrid meth- Personalization multiple-criteria odological framework that combines tech- Personalization and customization are con niques from the MCDA field and, more sidered increasingly important elements of decision-analysis specifically, from the disaggregation- marketing applications. By personalization aggregation approach to model users' and customization, we usually refer to ex- (MCDA) field preferences together with the collaborative- ploiting information about a user (who filtering technique to identify the most- might be a customer, individual, or group)to preferred unknown items for every user. better design products and services targeted 1541-1672/11 0◎2011|EEE IEEE INTELLIGENT SYSTEMS Computer
64 1541-1672/11/$26.00 © 2011 IEEE IEEE INTELLIGENT SYSTEMS Published by the IEEE Computer Society R e c o m m e n d e r S y s t e m s Multicriteria User Modeling in Recommender Systems Kleanthi Lakiotaki and Nikolaos F. Matsatsinis, Technical University of Crete Alexis Tsoukiàs, Université Paris Dauphine A hybrid recommender systems framework creates userprofile groups before applying a collaborativefiltering algorithm by incorporating techniques from the multiple-criteria decision-analysis (MCDA) field. act on the user’s behalf.1 To achieve this goal, it must gain knowledge of the user’s value system and decision policy. Most existing recommender systems use a collaborativefiltering approach, some are based on a content-based approach, and many have attempted to combine these two methods into hybrid frameworks. (See related research for representative review of recommender systems.2) Multiple-criteria decision analysis (MCDA) is a well-established field of decision science that aims at analyzing and modeling decision makers’ value systems to support them in the decision-making process. This article presents a hybrid methodological framework that combines techniques from the MCDA field and, more specifically, from the disaggregationaggregation approach to model users’ preferences together with the collaborativefiltering technique to identify the mostpreferred unknown items for every user. We form these profiles with a specially designed disaggregation-aggregation approach that uses the UTA* (Utilités Additives) algorithm. We demonstrate this proposed methodology as a movie recommender system and test its performance with real user data. Furthermore, through a comparison study with other single- and multiple-criteria collaborative-filtering methodologies, we clearly prove that creating user profile groups according to the proposed methodology is an integral part of a recommendation process. Personalization Personalization and customization are considered increasingly important elements of marketing applications.3 By personalization and customization, we usually refer to exploiting information about a user (who might be a customer, individual, or group) to better design products and services targeted Recommender systems are software applications that attempt to reduce information overload by recommending items of interest to end users based on their preferences, possibly giving movie, song, or other product suggestions. In this sense, an accurate recommender system ideally will be able to IS-26-02-Lakio.indd 64 18/03/11 3:30 PM
Related Work in Decision Science n the 18th century, Nicolas de Condorcet (1743-1794) constructs user profiles by exploiting preference infor divided the decision process into three stages: mation that is integrated into a user value system and using these profiles to identify recommendation pattern the first discussion phase, where the principles that uld serve as the basis for the decision are discussed A In the past, researchers have attempted to detect cross links between the MCDa methodological frameworks and the second discussion phase, in which the question is other scientific fields and disciplines, such as al and ma clarified, with opinions approached and combined with chine learning to enhance the preference modeling capa each other to a small number of more general opinions, bilities of MCDa approaches and to improve their overall and alternatives are determined; and performance and efficiency. However, careful consideration the third phase, which consists of the actual choice is necessary to incorporate multicriteria ratings in an ex- between these alternatives isting recommendation process or to design new recom- mendation techniques to achieve maximum Later, Herbet A Simon adjusted to make them suitable for decisions in organizations, by proaches Zhang and his colleagues recently introduced multicriteria placing them into the three intelligence, design, and choice rating in recommender systems from a statistical machine hases. ' In a more recent paper, Alexis Tsoukias introduced a descriptive model of the decision-aiding process that in- The UTARec system, a predecessor of our proposed sys- volves a set of activities occurring between a decision maker and an analyst, who develops a formal model to help the lecision-analysis techniques in recommender systems. g decision maker face a problematic situation. 2 This model However, UTARec constituted only an experimental proof considers the decision-aiding process as a cognition process of the multicriteria algorithm efficiency to predict real user introducing schematically the cognitive artifacts aimed at ratings and served as a stepping stone for the integrated supporting the decision makers decision process. Within hybrid multicriteria recommender system we present here. such a model, a recommendation results from the construc R creTeS tion of an evaluation model resulting from a problem formu- lation that formally represents a specific problem situation. he New Science of ma Multiple-criteria decision analysis(MCDA), a well- rentice Hall. 1977. established field of decision science, comes in a variety of 2. A. Tsoukias, "On the Concept of Decision Aiding Process: An Operational Perspective " Annals of Operations Research, theories, methodologies, and techniques. 3 MCDA aims assisting a decision maker in dealing with the ubiquitous teria. A common approach states that mCDa is a method ser Profiling in Recommender Systems, "ACM Trans. logy enabling the construction of a reliable and convin Information Systems, vol 22, no 1, 2004, pp. 54-88 ing model when several alternatives need to be assessed 5.L. against multiple attributes under different problem state- for User Modeling, " User Modeling and User-Adapted ments(such as choosing, ranking, or classifying raction, vol. 11, nos. 1-2, 2001. 6. S Berkovsky, T. Kuflik, and F Ricci, "Mediation of User Models User modeling is a cross-disciplinary research field that recommender Systems, attempts to construct models of human behavior within User Modeling and User-Adapted Interaction, vol 18. a specific computer environment. Some approaches of o.3,2008,pp.245-286. modeling user preferences have already been applied to 7. G. Adomavicius and Y.o. Kwon. "New Recommendation recommender systems and mainly adopt techniques and Techniques for Multicriteria Rating Systems, "IEEE Intelligent methodologies from the larger fields of Al, knowledge Systems, vol 22, no 3, 2007, pp 48-55. engineering, or data mining, such as ontological user pro- Y Zhang et al, "Applying Probabilistic Latent Semantic filing, or from statistics. 5 Yet, new user-modeling ideas and approaches appear throughout literature, indicating 9. K. Lakiotaki s Tsafarakis, and N. Matsatsinis,"UTA-Rec that it has emerged as an important functional tool to en- A Recommender System Based on Multiple Criteria Analysis, hance recommender system performance. However, to o Proc. ACM Conf Recommender Systems, ACM Press, 2008. knowledge, we are the first to propose an approach that pp.219-226. to that user. Recommender systems complete and accurate user profiles, information for the recommendation that assist users in discovering the and thus, user modeling has become algorithm. This overall rating depends items closest to their preferences con- a key area in the development of such only on a single criterion that usually stitute an intrinsically important technologi represents the overall preference of tool of personalization technologies. So far, the majority of existing rec- user u on item i. However, previous Nevertheless, the effectiveness of per- ommender systems obtain an over- work has argued that recommender sonalized services highly depends on all numerical rating rui as input systems researchers should focus MARCHIAPRIL 2011 computer. org/intelligent
March/April 2011 www.computer.org/intelligent 65 to that user. Recommender systems that assist users in discovering the items closest to their preferences constitute an intrinsically important tool of personalization technologies. Nevertheless, the effectiveness of personalized services highly depends on complete and accurate user profiles, and thus, user modeling has become a key area in the development of such technologies. So far, the majority of existing recommender systems obtain an overall numerical rating rui as input information for the recommendation algorithm. This overall rating depends only on a single criterion that usually represents the overall preference of user u on item i. However, previous work has argued that recommender systems researchers should focus more I n the 18th century, Nicolas de Condorcet (1743–1794) divided the decision process into three stages: • the first discussion phase, where the principles that would serve as the basis for the decision are discussed; • the second discussion phase, in which the question is clarified, with opinions approached and combined with each other to a small number of more general opinions, and alternatives are determined; and • the third phase, which consists of the actual choice between these alternatives. Later, Herbet A. Simon adjusted the existent approaches to make them suitable for decisions in organizations, by placing them into the three intelligence, design, and choice phases.1 In a more recent paper, Alexis Tsoukiàs introduced a descriptive model of the decision-aiding process that involves a set of activities occurring between a decision maker and an analyst, who develops a formal model to help the decision maker face a problematic situation.2 This model considers the decision-aiding process as a cognition process, introducing schematically the cognitive artifacts aimed at supporting the decision maker’s decision process. Within such a model, a recommendation results from the construction of an evaluation model resulting from a problem formulation that formally represents a specific problem situation. Multiple-criteria decision analysis (MCDA), a wellestablished field of decision science, comes in a variety of theories, methodologies, and techniques.3 MCDA aims at assisting a decision maker in dealing with the ubiquitous difficulties in seeking compromise or consensus between conflicting interests and goals, represented by multiple criteria. A common approach states that MCDA is a methodology enabling the construction of a reliable and convincing model when several alternatives need to be assessed against multiple attributes under different problem statements (such as choosing, ranking, or classifying). User modeling is a cross-disciplinary research field that attempts to construct models of human behavior within a specific computer environment. Some approaches of modeling user preferences have already been applied to recommender systems and mainly adopt techniques and methodologies from the larger fields of AI, knowledge engineering, or data mining, such as ontological user profiling,4 or from statistics.5 Yet, new user-modeling ideas and approaches appear throughout literature, indicating that it has emerged as an important functional tool to enhance recommender system performance.6 However, to our knowledge, we are the first to propose an approach that constructs user profiles by exploiting preference infor mation that is integrated into a user value system and using these profiles to identify recommendation patterns. In the past, researchers have attempted to detect cross links between the MCDA methodological frameworks and other scientific fields and disciplines, such as AI and machine learning, to enhance the preference modeling capabilities of MCDA approaches and to improve their overall performance and efficiency. However, careful consideration is necessary to incorporate multicriteria ratings in an existing recommendation process or to design new recommendation techniques to achieve maximum accuracy.7 Yin Zhang and his colleagues recently introduced multicriteria rating in recommender systems from a statistical machinelearning perspective.8 The UTARec system, a predecessor of our proposed system, is an initial demonstration of applying multicriteria decision-analysis techniques in recommender systems.9 However, UTARec constituted only an experimental proof of the multicriteria algorithm efficiency to predict real user ratings and served as a stepping stone for the integrated hybrid multicriteria recommender system we present here. References 1. H.A. Simon, The New Science of Management Decision, Prentice Hall, 1977. 2. A. Tsoukiàs, “On the Concept of Decision Aiding Process: An Operational Perspective,“ Annals of Operations Research, vol. 154, no. 1, 2007, pp. 3–27. 3. J. Figueira, S. Greco, and M. Ehrgott, eds., Multiple Criteria Decision Analysis: State of the Art Surveys, Springer, 2005. 4. S.E. Middleton, N.R. Shadbolt, and D.C.D. Roure, “Ontological User Profiling in Recommender Systems,” ACM Trans. Information Systems, vol. 22, no. 1, 2004, pp. 54–88. 5. I. Zukerman and D.W. Albrecht, “Predictive Statistical Models for User Modeling,” User Modeling and User-Adapted Interaction, vol. 11, nos. 1–2, 2001. 6. S. Berkovsky, T. Kuflik, and F. Ricci, “Mediation of User Models for Enhanced Personalization in Recommender Systems,” User Modeling and User-Adapted Interaction, vol. 18, no. 3, 2008, pp. 245–286. 7. G. Adomavicius and Y.O. Kwon, “New Recommendation Techniques for Multicriteria Rating Systems,” IEEE Intelligent Systems, vol. 22, no. 3, 2007, pp. 48–55. 8. Y. Zhang et al., “Applying Probabilistic Latent Semantic Analysis to Multi-criteria Recommender System,” AI Comm., vol. 22, no. 2, 2009, pp. 97–107. 9. K. Lakiotaki, S. Tsafarakis, and N. Matsatsinis, “UTA-Rec: A Recommender System Based on Multiple Criteria Analysis,” Proc. ACM Conf. Recommender Systems, ACM Press, 2008, pp. 219–226. Related Work in Decision Science IS-26-02-Lakio.indd 65 18/03/11 3:30 PM
RECOMMENDER SYSTEMS on the user-oriented perspective, indi- to learn the decision maker's pref. With the completion of this ste cating that people are not satisfied by erences. Both decision-support and we form a data matrix that acts as an existing recommender systems recommender systems try to assist input for the second phase to more effective person- the decision maker and user, respec- alization services is the ability to de- tively, throughout the decision- Second Phase: Multicriteria velop a system able to understand not making process. This decision might User Modeling only what people like, but why they vary from a simple purchase of an The multicriteria input data ma like it. In other words, an accurate item to more sophisticated manage- trix is next analyzed and processed modeling of a user's value system and rial matters throughout the systems second an effective preference representation sign of a recommendation algorithm To begin, we discuss the cr.ork phase leading to the formation of a schema will potentially lead to the de- Methodological Framework single k-dimensional vector for every user, which we call the significance with increased performance. Such a all framework of our proposed ap- weight vector or merely the weight system could understand how users proach together with the systems vector. During this second phase, we think about items by considering the individual components. Figure 1 apply the UTA, algorithm, 6 one of knowledge about the underlying at- summarizes our systems overall the most representative and widely tributes that attract users to choose process structure, and the follow- applied disaggregation-aggregation Preferences,not just patterns, ensur- involved ections outline the steps framework algorithms, to analyze the user's cognitive decision policy. ing a more sophisticated understand The UTA" algorithm adopts the pref- ing of the user. First Phase: Data Acquisition erence disaggregation principle, the In decision theory, the MCDA field For the data-acquisition procedure, philosophy of which is to assess emerged from the fact that real-world we must first gather two types of data infer preference models from given decision-making problems are intrin- attained from user statements. The preferential structures sically multidimensional. MCDa first type is preference data given as Figure 2 shows the complete se aims at aiding the decision maker numerical ratings, and the second ries of step in the disaggregation- in the decision-making process, deals with preference statements in aggregation approach. The first step concerning a set of objects, actions, the form of a ranking order, or the determines the problem statement. alternatives, and items evaluated on weak preference order. Among the various problem state multiple points of view, which are To acquire user-preference infor- ments in decision-aiding theory, three roughly referred to as criteria(attri- mation, every user u, E U, where t= 1, are most appropriate for this domain: butes, features, variables, and so on). 2, . n, and n is the total number of By "aiding, "we mean that MCDA users asked to evaluate a set of items choosing one or more potential ac- supports decision makers but does A; E AR, named the reference set Ar. tions from a set of actions(alterna not substitute them during the deci- For every alternative A; E AR, i= 1, tives)A sion process. (See the "Related Work 2, -. m, where m is the length of Ar, ranking those alternatives in a in Decision Science"sidebar for pre- the user u provides a rating rui, for ev- descending order, and vious research. ery criterion ci, j= 1, 2, .. k, where k sorting them into predefined or Under such a perspective, we face is the total number of criteria, follow- dered categories the recommendation process ing a predefined measurement scale decision problem and exploit tech- (that is, from one to five). In addition There are various ways to present niques from decision theory and the to these individual evaluations, we recommendations to the end user. We MCDA field to accurately build a ask users to rank all the alternatives can offer users the best item(choos- erences. Following this framework, descending order and thus provide, p model representing the user's pref- that belong to the reference set in a ing), present the top N items as a recommendation list (ranking),or a potential user in a recommender weak preference order. Indifference classify the items into categories, such system corresponds to the decision relations are acceptable in the rank- as highly recommended, fairly rec- maker in a decision process. We can ing order and are considered accord- ommended, and not recommended thus consider recommender systems ingly during the multicriteria user-(sorting). Accordingly, a recommenda as decision-support systems that need modeling phase. tion problem can equivalently belong ww.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
66 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS R e c o m m e n d e r S y s t e m s on the user-oriented perspective, indicating that people are not satisfied by existing recommender systems.4 The key to more effective personalization services is the ability to develop a system able to understand not only what people like, but why they like it. In other words, an accurate modeling of a user’s value system and an effective preference representation schema will potentially lead to the design of a recommendation algorithm with increased performance. Such a system could understand how users think about items by considering the knowledge about the underlying attributes that attract users to choose particular items and hence recognize preferences, not just patterns, ensuring a more sophisticated understanding of the user. In decision theory, the MCDA field emerged from the fact that real-world decision-making problems are intrinsically multidimensional.5 MCDA aims at aiding the decision maker in the decision-making process, concerning a set of objects, actions, alternatives, and items evaluated on multiple points of view, which are roughly referred to as criteria (attributes, features, variables, and so on). By “aiding,” we mean that MCDA supports decision makers but does not substitute them during the decision process. (See the “Related Work in Decision Science” sidebar for previous research.) Under such a perspective, we face the recommendation process as a decision problem and exploit techniques from decision theory and the MCDA field to accurately build a model representing the user’s preferences. Following this framework, a potential user in a recommender system corresponds to the decision maker in a decision process. We can thus consider recommender systems as decision-support systems that need to learn the decision maker’s preferences. Both decision-support and recommender systems try to assist the decision maker and user, respectively, throughout the decisionmaking process. This decision might vary from a simple purchase of an item to more sophisticated managerial matters. Methodological Framework To begin, we discuss the overall framework of our proposed approach together with the system’s individual components. Figure 1 summarizes our system’s overall process structure, and the following subsections outline the steps involved. First Phase: Data Acquisition For the data-acquisition procedure, we must first gather two types of data attained from user statements. The first type is preference data given as numerical ratings, and the second deals with preference statements in the form of a ranking order, or the weak preference order. To acquire user-preference information, every user ut ∈ U, where t = 1, 2, …, n, and n is the total number of users asked to evaluate a set of items Ai ∈ AR, named the reference set AR. For every alternative Ai ∈ AR, i = 1, 2, …, m, where m is the length of AR, the user u provides a rating rui, for every criterion cj, j = 1, 2, …, k, where k is the total number of criteria, following a predefined measurement scale (that is, from one to five). In addition to these individual evaluations, we ask users to rank all the alternatives that belong to the reference set in a descending order and thus provide a weak preference order. Indifference relations are acceptable in the ranking order and are considered accordingly during the multicriteria usermodeling phase. With the completion of this step, we form a data matrix that acts as an input for the second phase. Second Phase: Multicriteria User Modeling The multicriteria input data matrix is next analyzed and processed throughout the system’s second phase, leading to the formation of a single k-dimensional vector for every user, which we call the significance weight vector or merely the weight vector. During this second phase, we apply the UTA* algorithm,6 one of the most representative and widely applied disaggregation-aggregation framework algorithms, to analyze the user’s cognitive decision policy. The UTA* algorithm adopts the preference disaggregation principle, the philosophy of which is to assess and infer preference models from given preferential structures. Figure 2 shows the complete series of step in the disaggregationaggregation approach. The first step determines the problem statement. Among the various problem statements in decision-aiding theory,7 three are most appropriate for this domain: • choosing one or more potential actions from a set of actions (alternatives) A, • ranking those alternatives in a descending order, and • sorting them into predefined ordered categories. There are various ways to present recommendations to the end user. We can offer users the best item (choosing), present the top N items as a recommendation list (ranking), or classify the items into categories, such as highly recommended, fairly recommended, and not recommended (sorting). Accordingly, a recommendation problem can equivalently belong IS-26-02-Lakio.indd 66 18/03/11 3:30 PM
Fourth phase First phase Start value fur New user? ommend Introduce error ystem Yes? recommend nformation for recommender Solve line feedback Multicriteria MCRE data matrⅸ algorithm Stability analysis MCDA analysis Clusterin Significance similar Profile A Profile B Third phase Figure 1. Proposed system's build-up architecture. We follow a four-step sequential procedure to form groups of users with similar preferences and apply collaborative filtering inside each group to predict user ratings to one of the first three problem at this point, in the user modeling requirements are available elsewhere. 5) Although the UTA method per- but ultimately we are trying to pre- as follow. alued hinc. be a nondecreas tatements,depending on its design phase, the goal is to model the users Each criterion mus value system using the UTA method, ing, real tion defined on a formed in this step using the UTa dict ratings for unknown items algorithm belongs to the ranking- Following the disaggregation- g; A[8, lcr/a>g(a)ER(1) problem statement, this does not aggregation methodological schema, imply that the recommendation prob- the modeling process of level two lem should also belong to the same must conclude with a consistent where [8 8, l is the criterion evalua- problem statement. To elucidate the family of criteria [g1, g2,.,g]. tion scale; gr and g; are the worst and inconsistency that seems to emerge (More details on the criterion family the best level of the ith criterion, MARCHIAPRIL 2011 computer. org/intelligent
March/April 2011 www.computer.org/intelligent 67 to one of the first three problem statements, depending on its design architecture. Although the UTA method performed in this step using the UTA* algorithm belongs to the rankingproblem statement, this does not imply that the recommendation problem should also belong to the same problem statement. To elucidate the inconsistency that seems to emerge at this point, in the user modeling phase, the goal is to model the user’s value system using the UTA method, but ultimately we are trying to predict ratings for unknown items. Following the disaggregationaggregation methodological schema, the modeling process of level two must conclude with a consistent family of criteria {g1, g2, …, gk}. (More details on the criterion family requirements are available elsewhere.5) Each criterion must be a nondecreasing, real-valued function defined on A as follows: g A g g a j j j : , [ ] ( ) * → ⊂ → ∈ * g α (1) where [ , ] * g g* j j is the criterion evaluation scale; gj* and gj * are the worst and the best level of the jth criterion, Express global value function Introduce error functions Solve linear program Stability analysis Clustering algorithm MCDA analysis Significance weights Create discrete groups of similar preferences Multicriteria data matrix Collect preferential information for recommender system Marginal utility functions Profile A Profile B Third phase Fourth phase First phase Second phase Profile N ... Show recommender system Start UTA* algorithm End Feedback correction algorithm MCRF algorithm Provide feedback Need recommendation ? New user? No? No? No? Yes? Yes? Yes? | Ruser – rsystem | > MAE Figure 1. Proposed system’s build-up architecture. We follow a four-step sequential procedure to form groups of users with similar preferences and apply collaborative filtering inside each group to predict user ratings. IS-26-02-Lakio.indd 67 18/03/11 3:30 PM
RECOMMENDER SYSTEMS (level 2) g Decision data and users global judgment policy Instead of randomly selecting initia values for all cluster centers this al (level 1) rithm crement by optimally adding one new cluster Consistency of the preference model and user's Preference model center at each stage the one that min- global judgment policy construction imizes a certain clustering criterion Suppose we are given a data set {x1x2,…,xn,xn∈Rd. Thek-clustering Decision problem divides this data set into k disjointed groups called clusters C1, C2,., Ck by optimizing a certain The disaggregation-aggregation approach. According to this approach clustering criterion. We have adopted ision maker's value system is first decomposed into decision variables that the most widely used clustering crite- additively composed to build a decision model, which is as consistent as rion: the sum of squared error(SSE) with the actual decision makers value system. between each data point x;(i =1, 2,,nm)and the centroid m,(i=1, respectively; gia)is the evaluation or with each criterion, approximated by 2,.,k)of the subset Ci, which con performance of action a on the jth cri- linear segments, as well as the crite- tains x This clustering criterion terion; and g(a) is the vector of perfor- ria significance weights (trade-offs depends on the cluster centers m mances of action a on the k criteria. among the criteria values). The latter, m2,..., mk, and is shown in Equation 2 The multicriteria data input matrix expressed as a weight vector, serves is processed by the UTA algorithm each user's value system information- through an iterative ordinal regression representation schema and provides SSE( procedure(Analytical details and an the required user-modeling data to ∑∑(c川x-mP illustrative example of the UTA'algo- proceed to the next phase rithm are available elsewhere. 6) The UTA algorithm considers as Third Phase: Clustering Applied to the set of user weight input a weak-order preference struc- Generally, a clustering algorithm vectors, global k-means labels every ture on a set of actions, together with divides the original data set into dis- user to a specific group the performances of the alternatives jointed groups. Clustering is an unsu on all attributes, and returns as out- pervised process that aims to group Fourth Phase: Recommendation put a set of additive value functions data objects based only on informa- Following the formation of user groups based on multiple criteria. It does this tion found in the data that describes with similar preferences(user-profile in such a way that the resulting struc- the objects and their relationships. clusters), we can provide users with ture will be as consistent as possible The objects within a group should be curate item recommendations by im- with the initial structure given by the similar (or related)as much as possi- plementing the collaborative-filtering user. This is accomplished using spe- ble to one another and should be dif- philosophy inside each user group cial linear programming techniques. ferent from (or unrelated to) the ob- The multidimensional MRCF The UTA algorithm follows four jects in other groups. Most existing (MRCF-dim)approach we apply here basic steps during which all the nec- clustering algorithms are sensitive to is based on multidimensional distance essary parameters to estimate global initial parameters, such as the num- metrics. First, it calculates the dis value functions for each item and ber of clusters and initial centroid tance between two users, u and u, for user are calculated. (See"The UTA positions. To limit these shortcom- the same item, according to Equation 3 Algorithm"sidebar for more details. ings, we use global k-means, a deter Thus, a value is assessed for each al- ministic approach to the traditional ternative that belongs to the reference k-means clustering algorithm, in this dmar=,Eram-Irnl set, quantifying its value to each user third phase. Global k-means does and ensuring consistency with the us- not depend on any initial parameter er's value system. UTA"'s output in- values and employs the k-means al- where ru is the rating vector of user u volves the value functions associated gorithm as a local search procedure. and r is the rating vector of user u ww.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
68 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS R e c o m m e n d e r S y s t e m s respectively; gj (a) is the evaluation or performance of action a on the jth criterion; and g(a) is the vector of performances of action a on the k criteria. The multicriteria data input matrix is processed by the UTA* algorithm through an iterative ordinal regression procedure. (Analytical details and an illustrative example of the UTA* algorithm are available elsewhere.6) The UTA* algorithm considers as input a weak-order preference structure on a set of actions, together with the performances of the alternatives on all attributes, and returns as output a set of additive value functions based on multiple criteria. It does this in such a way that the resulting structure will be as consistent as possible with the initial structure given by the user. This is accomplished using special linear programming techniques. The UTA* algorithm follows four basic steps during which all the necessary parameters to estimate global value functions for each item and user are calculated. (See “The UTA* Algorithm” sidebar for more details.) Thus, a value is assessed for each alternative that belongs to the reference set, quantifying its value to each user and ensuring consistency with the user’s value system. UTA*’s output involves the value functions associated with each criterion, approximated by linear segments, as well as the criteria significance weights (trade-offs among the criteria values). The latter, expressed as a weight vector, serves as each user’s value system informationrepresentation schema and provides the required user-modeling data to proceed to the next phase. Third Phase: Clustering Generally, a clustering algorithm divides the original data set into disjointed groups. Clustering is an unsupervised process that aims to group data objects based only on information found in the data that describes the objects and their relationships. The objects within a group should be similar (or related) as much as possible to one another and should be different from (or unrelated to) the objects in other groups. Most existing clustering algorithms are sensitive to initial parameters, such as the number of clusters and initial centroid positions. To limit these shortcomings, we use global k-means,8 a deterministic approach to the traditional k-means clustering algorithm, in this third phase. Global k-means does not depend on any initial parameter values and employs the k-means algorithm as a local search procedure. Instead of randomly selecting initial values for all cluster centers, this algorithm acts in an incremental way by optimally adding one new cluster center at each stage, the one that minimizes a certain clustering criterion. Suppose we are given a data set {x1, x2, …, xn}, xn∈Rd. The k-clustering problem divides this data set into k disjointed groups called clusters C1, C2, …, Ck by optimizing a certain clustering criterion. We have adopted the most widely used clustering criterion: the sum of squared error (SSE) between each data point xi (i = 1, 2, …, n) and the centroid mj (j = 1, 2, …, k) of the subset Cj, which contains xi. This clustering criterion depends on the cluster centers m1, m2, …, mk, and is shown in Equation 2. SSE ( , ,..., ) ( ) | | m m m I x C x m K i j j K i N i j 1 2 11 = ∈ − == ∑∑ 2 (2) Applied to the set of user weight vectors, global k-means labels every user to a specific group. Fourth Phase: Recommendation Following the formation of user groups with similar preferences (user-profile clusters), we can provide users with accurate item recommendations by implementing the collaborative-filtering philosophy inside each user group. The multidimensional MRCF (MRCF-dim) approach we apply here is based on multidimensional distance metrics. First, it calculates the distance between two users, u and u′, for the same item, according to Equation 3: duu un u n n k ′ ′ = + = − ∑( ) r r 1 1 2 (3) where ru is the rating vector of user u and ru′ is the rating vector of user u′. Criteria modeling (level 2) Problem (level 1) Decision data and user’s global judgment policy Preference model construction Decision Consistency of the preference model and user’s global judgment policy Figure 2. The disaggregation-aggregation approach. According to this approach, the decision maker’s value system is first decomposed into decision variables that are then additively composed to build a decision model, which is as consistent as possible with the actual decision maker’s value system. IS-26-02-Lakio.indd 68 18/03/11 3:31 PM
The UTA Algorithm he UTA* algorithm aims at estimating additive utilities Step 3. Solve the linear program(LP) subject to the following constraints: ject to ak ak, ak+ =0 if ak- aki ∑u)=u1(91)+u2(g2)+…+Umn( where u (gii=1,., m are nondecreasing real-valued functions called marginal utility functions. wg20,(ak)20,a(ak)≥0ⅵi, j and k We can summarize the UTA* algorithm in four steps Step 1. Express the global value of reference actions ug(ax)l,k=1, 2,..., m, first in terms of marginal values uA(g), and then in terms of variables wy according to Equa- Step 4(stability analysis). Check the existence of multiple or tion C. The transformation of the global value of reference near-optimal solutions of the linear program(Equation F) actions into weights values expression is made rding to In case of nonuniqueness, find the mean additive value function of those(near) optimal solutions that maximiz the objective functions of Equation G, on the polyhedro wg=u(g4")-g)20 of the constraints of the LP(Equation F) bounded by the constraint of(Equation H), where z* is the optimal value of v=12…n, (C) the LP in step 3 and e a very small positive number (g)=0V=12 u,(9')=2 ()=∑wtW=12… n and j=23…a-1 ∑|o(a)+aa2)≤z'+ Step 2. Introduce two error functions o+ and o on A reference set of alternatives) by writing for each pair of successive actions in the given ranking Equation E: By applying the UTA* algorithm, all the necessary A(ak, ak+=u g(a,)-a*(a)+a(a) etters tat vse imete gloated tiny tuanctioe s assess to aht (E) fying the alternative s utility to each user and ensuring g(ax+1+o(ax+1-0(a, consistency with his or her value system. By rating vector, we mean the set of where U(u, u)denotes the set of This notion of similarity ensures that ratings that user u provided for item i, items that both u and u have rated. the similarity will approach zero including the overall rating. Here k+1 This means that the overall distance the distance between two users be represents the dimension of this rating between the two users, u and u, is comes larger, and it will be one if two ector, as a result of the k criteria and the average distance between their users rated all their common items the overall rating that altogether de- ratings for all their common items evenly fine the user vector's dimensionality Finally, the users'similarity, which Note that sim(u, u) is calculated econd we determine the overall is inversely related to their distance, if and only if u belongs to the same distance between users u and u Is given by group as u and their U(u, u")is not dst,)=-1 empty (Henceforth, we refer to these (4) sim(u, t') (5) users as mates. )Therefore, the compu- 1+dist(u, u) ational effort is minimized compared MARCHIAPRIL 2011 computer. org/intelligent
March/April 2011 www.computer.org/intelligent 69 By rating vector, we mean the set of ratings that user u provided for item i, including the overall rating. Here k + 1 represents the dimension of this rating vector, as a result of the k criteria and the overall rating that altogether define the user vector’s dimensionality. Second, we determine the overall distance between users u and u′: dist u u U u u duu i U u u ( , ) ( , ) ( , ) ′ = ′ ′ ∈ ′ ∑ 1 (4) where U(u, u′) denotes the set of items that both u and u′ have rated. This means that the overall distance between the two users, u and u′, is the average distance between their ratings for all their common items. Finally, the users’ similarity, which is inversely related to their distance, is given by sim u u dist u u ( , ) ( , ) ′ = + ′ 1 1 (5) This notion of similarity ensures that the similarity will approach zero as the distance between two users becomes larger, and it will be one if two users rated all their common items evenly.9 Note that sim(u, u′) is calculated if and only if u′ belongs to the same group as u and their U(u, u′) is not empty. (Henceforth, we refer to these users as mates.) Therefore, the computational effort is minimized compared The UTA* algorithm aims at estimating additive utilities in this form: U u gi i m i ( ) ( ) g = = ∑ 1 (A) subject to the following constraints: u g i u g u g u g i i i i m i ( ) ( ) ( ) ( * * * = ∀ = + = ∑ 0 1 1 1 2 2* * ) ... ( ) + + = u g m m 1 (B) where ui(gi) i = 1, …, m are nondecreasing real-valued functions called marginal utility functions. We can summarize the UTA* algorithm in four steps. Step 1. Express the global value of reference actions u[g(αk)], k = 1, 2, …, m, first in terms of marginal values ui(gi ), and then in terms of variables wij according to Equation C. The transformation of the global value of reference actions into weights values expression is made according to Equation D: w u g g i n j ij i i j i j ui = ≥ ∀ = = + ( ) − , , ,..., , , , ( ) 1 0 1 2 1 2 ...,ai − 1 (C) u g i n u g w i i i i j it ( ) ( ) , ,..., 1 = ∀ = 0 1 2 = ∀ = and = − = − ∑ i n j ai t j 1 2 2 3 1 1 1 , ,..., , ,..., (D) Step 2. Introduce two error functions s+ and s- on ARi (reference set of alternatives) by writing for each pair of successive actions in the given ranking Equation E: ∆( ) , ( ) ( ) ( ) ( a a u u k k a a + k = − − + + − + 1 1 g g σ α σ α κ κ κ ) ( ) ( ) + − + + − + σ α σ α κ 1 κ 1 (E) Step 3. Solve the linear program (LP): [min] ( z ( ) ( ) a k k k = + − + = ∑ σ α σ α κ µ 1 subject to ∆ , ) ( , ) a a a a a a a k k k k k k k + + + ≥ = 1 1 1 0 δ if if ∆ + = − = + ∀ = ≥ ≥ ∑∑ 1 1 1 1 1 0 0 k w w ij j a i n ij k i , ( ) σ α , ( ) , and σ α − ≥ ∀ k 0 i j k (F) Step 4 (stability analysis). Check the existence of multiple or near-optimal solutions of the linear program (Equation F). In case of nonuniqueness, find the mean additive value function of those (near) optimal solutions that maximize the objective functions of Equation G, on the polyhedron of the constraints of the LP (Equation F) bounded by the constraint of (Equation H), where z* is the optimal value of the LP in step 3 and e a very small positive number. u g w i n i i ij j ai ( ) , ,..., * = ∀ = = − ∑ 1 1 1 2 (G) σ σ ε + − + ≤ + = ∑ ( ) ( ) * a a k k k m z 1 (H) By applying the UTA* algorithm, all the necessary parameters to estimate global utility functions U(g(a)) for each alternative are calculated. Thus, a value is assessed quantifying the alternative’s utility to each user and ensuring consistency with his or her value system. The UTA* Algorithm IS-26-02-Lakio.indd 69 18/03/11 3:31 PM
RECOMMENDER SYSTEMS to traditional nonclustering ap- to the group,'s maximum SE. the item-weighted cosine-base proaches that compute sim(u, u,) fe ding on the feedback functions ilarities fror all possible user combinations. the system might update the (MRCF-av), and an MRCF tecl After calculating a similarity index user's profile by changing this user's that uses worst-case similarity to ag- for mate users, we provide a potential group gregate item-weighted cosine-based rating R(u, i) for any unexplored item i: Here, we have provided a simplified similarities from individual criteria version of this approach. We plan to (MRCF-min cated alternatives(such as relevance single-Rating Collaborative feedback techniques) into the system. Filtering(SR-CF) a’∈C(a) By a single rating, we mean that just (6) other Sing ngle. and the overall rating is considered in al ∑ sin(ay,u")R{a’,i) Muiticriteria-Rating calculations. In the SR-CF approach a’∈C(un) Collaborative-Filtering a similarity index sim(u, u,)is calcu- ApP lated for all possible u-u combina- IS a similarity weighted We believe that multiple criteria tions according to the item-weighted of ratings, and C(u) de- should be considered to better under- cosine similarity function that uses neighborhood, mean- stand user-decision policy Our goal is the common notion of cosine similar ng the which u belongs. to construct a decision model and be ity. However, even though cosine able to recommend items of inter- ilarity measure has been extensively Feedback Mechanism est by exploiting this model. We also used in recommender systems, it fails The system's feedback mechanism is claim that the user-modeling phase to compute a rating in the case of a activated by users when they are will- can be more effective if more sophis- single common item. If there is only g to provide a rating for an item ticated methods, specially developed one common item, cosine similarity they explored after following the sys- to treat multiple-criteria decisions, will result in one, regardless of the tems preceding recommendations. If are used to build user profiles. It is differences in individual ratings. Fur a user disagrees with the recommen- important to understand how users thermore, because cosine similarity dation given and provides the rating come to their decisions and not only does not consider the size of U(u, u), for the specific item, the system pro- consider their past actions or other we used a weighted approach of this cesses this information by triggering people's similar decisions. It is likely measure the feedback-correction algorithm. that two users who gave an item the According to this algorithm, the new same overall grade passed through sim(u, n? user value is compared to past sys- dissimilar decision routes to reach the ∑R(,nR(, tem value in terms of absolute differ- same point. Finally, we believe that ence. If this difference is greater than different users also have different A the mean absolute difference stored knowledge, interests, abilities, learn- ∑Rm,n2 for this user, then this alternative is ing styles, and preferences; however, included in the reference set and the this does not preclude the existence (7) UTA algorithm runs again to calcu- of discrete patterns among users, or for this user. The weight vector will To verify the effectiveness of our A=U(u,w) late a new significance weight vector user-profile groups. U() indicate whether this particular user approach, we compared it to popular should belong to a different group. To collaborative-filtering approaches. 9 decide this, the feedback-correction More specifically, we compared it to In Equation 7, the traditional cosine algorithm calculates the squared Eu- a single-rating collaborative-filtering similarity measure inside the paren clidean (SE)distance of the user's approach(SR-CF)that uses the item- thesis is multiplied by A, which is the weight vector from every centroid weighted cosine-based similarity to cal- percentage of common items U(u, u, of the formed groups. This particu- culate similarities, a multiple-rating that both u and uhave rated,over lar user will now belong to the group collaborative-filtering technique that U(u), the total number of items that here each user's SE is less than or uses average similarity to aggregate u has rated. Obviously, this favors ww.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
70 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS R e c o m m e n d e r S y s t e m s to traditional nonclustering approaches that compute sim(u, u′) for all possible user combinations. After calculating a similarity index for mate users, we provide a potential rating R(u, i) for any unexplored item i: R u i sim u u sim u u C u ( , ) ( , ) ( ( ) = ′ ⋅ ′∈ ∑ 1 , ) ( , ) ( ) ′ ⋅ ′ ′∈ ∑ u R u i u C u (6) This is in fact a similarity weighted sum of known ratings, and C(u) defines the user’s neighborhood, meaning the cluster to which u belongs. Feedback Mechanism The system’s feedback mechanism is activated by users when they are willing to provide a rating for an item they explored after following the system’s preceding recommendations. If a user disagrees with the recommendation given and provides the rating for the specific item, the system processes this information by triggering the feedback-correction algorithm. According to this algorithm, the new user value is compared to past system value in terms of absolute difference. If this difference is greater than the mean absolute difference stored for this user, then this alternative is included in the reference set and the UTA* algorithm runs again to calculate a new significance weight vector for this user. The weight vector will indicate whether this particular user should belong to a different group. To decide this, the feedback-correction algorithm calculates the squared Euclidean (SE) distance of the user’s weight vector from every centroid of the formed groups. This particular user will now belong to the group where each user’s SE is less than or equal to the group’s maximum SE. Depending on the feedback function’s results, the system might update the user’s profile by changing this user’s group. Here, we have provided a simplified version of this approach. We plan to investigate integrating more sophisticated alternatives (such as relevance feedback techniques) into the system. Other Single- and Multicriteria-Rating Collaborative-Filtering Approaches We believe that multiple criteria should be considered to better understand user-decision policy. Our goal is to construct a decision model and be able to recommend items of interest by exploiting this model. We also claim that the user-modeling phase can be more effective if more sophisticated methods, specially developed to treat multiple-criteria decisions, are used to build user profiles. It is important to understand how users come to their decisions and not only consider their past actions or other people’s similar decisions. It is likely that two users who gave an item the same overall grade passed through dissimilar decision routes to reach the same point. Finally, we believe that different users also have different knowledge, interests, abilities, learning styles, and preferences; however, this does not preclude the existence of discrete patterns among users, or user-profile groups. To verify the effectiveness of our approach, we compared it to popular collaborative-filtering approaches.9 More specifically, we compared it to a single-rating collaborative-filtering approach (SR-CF) that uses the itemweighted cosine-based similarity to calculate similarities, a multiple-rating collaborative-filtering technique that uses average similarity to aggregate the item-weighted cosine-based similarities from individual criteria (MRCF-av), and an MRCF technique that uses worst-case similarity to aggregate item-weighted cosine-based similarities from individual criteria (MRCF-min). Single-Rating Collaborative Filtering (SR-CF) By a single rating, we mean that just the overall rating is considered in all calculations. In the SR-CF approach, a similarity index sim(u, u′) is calculated for all possible u–u′ combinations according to the item-weighted cosine similarity function that uses the common notion of cosine similarity. However, even though cosine similarity measure has been extensively used in recommender systems, it fails to compute a rating in the case of a single common item. If there is only one common item, cosine similarity will result in one, regardless of the differences in individual ratings. Furthermore, because cosine similarity does not consider the size of U(u, u′), we used a weighted approach of this measure: sim u u A R u i R u i R u i i U u u ( , ) ( , ) ( , ) ( , ) ( , ) ′ = ⋅ ⋅ ′ ∈ ′ ∑ 2 i U u u i U u u R u i ∈ ′ ∈ ′ ∑ ∑ ⋅ ′ ( , ) ( , ) ( , )2 (7) A U u u U u = ( , )′ ( ) (8) In Equation 7, the traditional cosine similarity measure inside the parenthesis is multiplied by A, which is the percentage of common items U(u, u′) that both u and u′ have rated, over U(u), the total number of items that u has rated. Obviously, this favors IS-26-02-Lakio.indd 70 18/03/11 3:31 PM
similarities of pairs of users with a Statistical Accuracy Metrics Classification Accuracy Metrics large common item set Statistical accuracy metrics mea- Classification accuracy metrics deter- sure how close the numerical value mine the success of a prediction alge Multirating Collaborative ui-which is generated by the rec- rithm in correctly classifying items Filtering(MRCF) ommender system and represents the In recommender systems, a rational Researchers have proposed two expected rating of user u on item i- classification of items would be as basic approaches that include multi- is to the actual numerical rating rui, highly recommended and not recom- rating information in similarity calcu- as provided by the same user for the mended. The system is likely to pro lations. 9 The first considers individual same item. The mean absolute error pose items in the first class, while similarities on different attributes, (MAE) is the most commonly used items that belong to the second cat- which are traditionally calculated statistical accuracy metric. Because egory will be never shown to the user by cosine similarity metrics, and the MAE measures the deviation of pre- Precision, the number of true posi- second calculates similarities based dictions generated by the recom- tives, is the number of items correctly on multidimensional distance met- mender system from the true rating labeled as belonging to the highly rics. We adapted the latter in the rec- values, as they were specified by the recommended class divided by the ommendation phase of the proposed user, it is measured only for the items total number of items belonging to methodol for which user u has expressed his the same class. Recall is the number Following the first approach, vari- opinion. of true positives divided by the total ous techniques are employed to ag- If user u has expressed an opinion number of elements that actually gregate individual similarities. We on n items, then the MAEu is for- belong to the highly recommended sed two different aggregation meth- mally given by Equation 9. lass. Because there is a trade ods: average and worst-case similar off between precision and recall ity. Both calculate cosine similarities F-measure (a harmonic mean that on all criteria as well as on the overall MAE =u-l (9)equally weights precision and recall) values. Their only difference is that in is often used the average similarity approach these individual similarities are averaged, We can calculate the average MAE Rank Correlation Coefficient while in the worst-case similarity for an entire data set by averaging Kendall's tau is a measure of correl scenario the minimum of all attri- the MAEs of all users, MAE for tion between two ordinal-level vari bute and overall similarities is chosen u=1.2. m over the total number ables. To calculate Kendall's tau for an to represent users'similarity. How- of available users m, and thereby ob- sample of n items, there are [n(n-1)/2 ever, even though we used these two tain an overall estimation of a mod- possible comparisons of points (xi, yi approaches, cosine similarities in this el,s performance and (xi, yi). Suppose Mc is the num work use the item-weighted varia- Another popular statistical accu- ber of pairs that are concordant, MD tion of cosine similarity as given in racy metric is the root mean squared is the number of discordant pairs, quation error(RMSE). In MAE, all the in- and M is the total number of pair dividual differences are weighted By concordant pair, we mean that Assessment equally in the average, while in RMSE, for the specific pair of items, both In general, recommender systems because the errors are squared before the user and the model ranked them have been evaluated in many, often being averaged, relatively high weight identically. The formula for Kendall's incomparable, ways.(A review of is given to large errors. We calculate tau is such collaborative-filtering rec- RMSE as follows mender system evaluations is available elsewhere. 10) This work (11) focuses on the ways to measure pre- RMSE, Evu-Tr (M-ly)-(-ly) diction quality. To assess predic- tion quality, we used three different where Iy is the number of equivalent metrics: statistical accuracy, classifi- Similar to the average MAE, the av- pairs regarding ranking order Y (the cation accuracy, and rank correlation erage RMSE offers a global estima- user's ranking order), and Ie is the coefficient tion of a models prediction accuracy. number of equivalent pairs regarding MARCHIAPRIL 2011 computer. org/intelligent
March/April 2011 www.computer.org/intelligent 71 similarities of pairs of users with a large common item set. Multirating Collaborative Filtering (MRC F) Researchers have proposed two basic approaches that include multirating information in similarity calculations.9 The first considers individual similarities on different attributes, which are traditionally calculated by cosine similarity metrics, and the second calculates similarities based on multidimensional distance metrics. We adapted the latter in the recommendation phase of the proposed methodology. Following the first approach, various techniques are employed to aggregate individual similarities. We used two different aggregation methods: average and worst-case similarity. Both calculate cosine similarities on all criteria as well as on the overall values. Their only difference is that in the average similarity approach these individual similarities are averaged, while in the worst-case similarity scenario the minimum of all attribute and overall similarities is chosen to represent users’ similarity. However, even though we used these two approaches, cosine similarities in this work use the item-weighted variation of cosine similarity as given in Equation 7. Assessment In general, recommender systems have been evaluated in many, often incomparable, ways. (A review of such collaborative-filtering recommender system evaluations is available elsewhere.10) This work focuses on the ways to measure prediction quality. To assess prediction quality, we used three different metrics: statistical accuracy, classification accuracy, and rank correlation coefficient. Statistical Accuracy Metrics Statistical accuracy metrics measure how close the numerical value r′ ui—which is generated by the recommender system and represents the expected rating of user u on item i— is to the actual numerical rating rui, as provided by the same user for the same item. The mean absolute error (MAE) is the most commonly used statistical accuracy metric. Because MAE measures the deviation of predictions generated by the recommender system from the true rating values, as they were specified by the user, it is measured only for the items for which user u has expressed his opinion. If user u has expressed an opinion on n items, then the MAEu is formally given by Equation 9. MAEu ui ui i n n = − r r′ = ∑ 1 1 (9) We can calculate the average MAE for an entire data set by averaging the MAEs of all users, MAEu, for u = 1, 2, …, m, over the total number of available users m, and thereby obtain an overall estimation of a model’s performance. Another popular statistical accuracy metric is the root mean squared error (RMSE). In MAE, all the individual differences are weighted equally in the average, while in RMSE, because the errors are squared before being averaged, relatively high weight is given to large errors. We calculate RMSE as follows: RMSEu ui ui i n n = − r r′ = ∑ 1 1 2 ( ) (10) Similar to the average MAE, the average RMSE offers a global estimation of a model’s prediction accuracy. Classification Accuracy Metrics Classification accuracy metrics determine the success of a prediction algorithm in correctly classifying items. In recommender systems, a rational classification of items would be as highly recommended and not recommended. The system is likely to propose items in the first class, while items that belong to the second category will be never shown to the user. Precision, the number of true positives, is the number of items correctly labeled as belonging to the highly recommended class divided by the total number of items belonging to the same class. Recall is the number of true positives divided by the total number of elements that actually belong to the highly recommended class. Because there is a tradeoff between precision and recall, F-measure (a harmonic mean that equally weights precision and recall) is often used. Rank Correlation Coefficient Kendall’s tau is a measure of correlation between two ordinal-level variables. To calculate Kendall’s tau for any sample of n items, there are [n (n – 1)/2] possible comparisons of points (xi, yi ) and (xj, yj ). Suppose MC is the number of pairs that are concordant, MD is the number of discordant pairs, and M is the total number of pairs. By concordant pair, we mean that for the specific pair of items, both the user and the model ranked them identically. The formula for Kendall’s tau is τ = − − − − M M M I M I C D Y Y ( ) ( )ˆ (11) where IY is the number of equivalent pairs regarding ranking order Y (the user’s ranking order), and I Yˆ is the number of equivalent pairs regarding IS-26-02-Lakio.indd 71 18/03/11 3:31 PM
RECOMMENDER SYSTEMS Table 1 Sample multicriteria data input matrix before preparati Overa Acting Story Direction Visuals A A A with f denoting the worst evalu B4 the most preferred value. For pro- B essing purposes, we replaced let- ters with numbers. so that one corre A 9 sponded to the worst value(formerly F rating) and 13 to the best value (formerly A+). In addition to individual criteria ratings, users were asked to provide an over grade that reflected their global preference over each movie. Table 2 Sample multicriteria data input matrix after preparation Table 1 shows a typical raw data Ranking Acting Story Direction Visuals Movie form, and Table 2 shows the same c (c4) data in final form. Data cleaning fol- 13 lowed soon after the data: [ion 10 2345122 2310 phase to remove any case with at least one or more missing values and thus shrunk our data set by 18 percent 23 We then applied a subsequent flter to cut out users with less than five rated movies to ensure an adequate set of 301 evaluated movies for every user To this end, the resulting experi mental data set included 6.078 dif- ferent users and 976 different mov ies. We ended up with 62, 156 ratings, and every user had rated an average Table 3. Pearson correlation matrix of the data set.* of 10 movies. The average evaluation Criteria ction Visuals Overall grade was 9.6, 9.9, 9.5, 10.5, and 9.6 0.834 for the criteria acting, story, dire tion,visuals, and overall categories, Story 0.834 0.905 espectively. Table 3 shows the data Direction 0.835 0.911 set's Pearson correlation matrix Visuals 0.786 0.782 0.835 1.000 0.834 Although recommender systems researchers are accustomed to very Overall 0.905 0.911 0.834 1.000 large data sets(such as the Netflix data set that consists of mately 480,000 users), these data nking order y (the model's ranking Data Set Description sets consist of an overall single rat- order). Kendalls tau varies between We retrieved our experimental data set ing(usually in a scale of one to five -1and1,with1indicatingatotalfromYahoo!movies(http://movies.stars)anddonotprovideanyinfor agreement of the order yahoo. com), where users provided mation on individual criteria. Gen reference information on mov- erally, it is not easy to come across A Multicriteria Movie ies based on four different crite- data sets with preference inform Recommender System ria. the four attributes that consti- tion on several attributes because As an empirical study, we applied the tuted the criteria family were acting it is commonly believed that people four phases of our proposed meth- (c1), story (c2), direction (c3), and are unwilling to provide a lot of odology to a movie recommender visuals (c4). All values were mea- formation. We advocate that pref sured in a 13-fold qualitative scale erence information on individual ww.computer.org/intelligent IEEE INTELLIGENT SYSTEMS
72 www.computer.org/intelligent IEEE INTELLIGENT SYSTEMS R e c o m m e n d e r S y s t e m s ranking order Yˆ (the model’s ranking order). Kendall’s tau varies between –1 and 1, with 1 indicating a total agreement of the orders. A Multicriteria Movie Recommender System As an empirical study, we applied the four phases of our proposed methodology to a movie recommender system. Data Set Description We retrieved our experimental data set from Yahoo!Movies (http://movies. yahoo.com), where users provided preference information on movies based on four different criteria. The four attributes that constituted the criteria family were acting (c1), story (c2), direction (c3), and visuals (c4). All values were measured in a 13-fold qualitative scale with F denoting the worst evaluation grade and A+ declaring the most preferred value. For processing purposes, we replaced letters with numbers, so that one corresponded to the worst value (formerly the F rating) and 13 to the best value (formerly A+). In addition to individual criteria ratings, users were asked to provide an overall grade that reflected their global preference over each movie. Table 1 shows a typical raw data form, and Table 2 shows the same data in final form. Data cleaning followed soon after the data-acquisition phase to remove any case with at least one or more missing values and thus shrunk our data set by 18 percent. We then applied a subsequent filter to cut out users with less than five rated movies to ensure an adequate set of evaluated movies for every user. To this end, the resulting experimental data set included 6,078 different users and 976 different movies. We ended up with 62,156 ratings, and every user had rated an average of 10 movies. The average evaluation grade was 9.6, 9.9, 9.5, 10.5, and 9.6 for the criteria acting, story, direction, visuals, and overall categories, respectively. Table 3 shows the data set’s Pearson correlation matrix. Although recommender systems researchers are accustomed to very large data sets (such as the Netflix data set that consists of approximately 480,000 users), these data sets consist of an overall single rating (usually in a scale of one to five stars) and do not provide any information on individual criteria. Generally, it is not easy to come across data sets with preference information on several attributes because it is commonly believed that people are unwilling to provide a lot of information. We advocate that preference information on individual Table 1. Sample multicriteria data input matrix before preparation. User ID Overall grade Acting (c1) Story (c2) Direction (c3) Visuals (c4) Movie ID 1 A+ A+ A A+ A- 1 B+ B+ A+ B A+ 4 B B A- B A+ 25 B- B+ B+ B B 23 C+ C B C+ A+ 9 2 A A+ A- A- A+ 9 B+ B+ B B B 18 B+ A- A- A+ B 2 … … … … … … Table 2. Sample multicriteria data input matrix after preparation. User ID Ranking order Acting (c1) Story (c2) Direction (c3) Visuals (c4) Movie ID 1 1 13 12 13 11 1 2 10 13 9 13 4 3 9 11 9 13 25 4 10 10 9 9 23 5 6 9 7 13 9 2 1 13 11 11 13 9 2 10 9 9 9 18 2 11 11 13 9 2 … … … … … … Table 3. Pearson correlation matrix of the data set.* Criteria Acting Story Direction Visuals Overall Acting 1.000 0.834 0.857 0.786 0.865 Story 0.834 1.000 0.871 0.782 0.905 Direction 0.857 0.871 1.000 0.835 0.911 Visuals 0.786 0.782 0.835 1.000 0.834 Overall 0.865 0.905 0.911 0.834 1.000 *Correlation is significant at the 0.01 level. IS-26-02-Lakio.indd 72 18/03/11 3:31 PM
12 criteria offer valuable already global k- knowledge for the design means and effectiveness of rec- ring step ommender systems be This means that Sse will ause it can be processed continuously decrease over to build a user's value the number of clusters ecision p Figure 3 shows a plot of SSEs for different number to provide this informa sters tion can lead to a signifi Although we can get a cant improvement in the 5101520253035404550 systems recommendation Number of clusters ch gh estimation of the clustering tendency from accuracy, which helps Figure 3, because the SSE ustify requesting a us- Figure 3. Sum of squared errors(SSE)versus the number of will be constantly er's additional time and clusters. SSE continuously decreases over the number of clusters reasing with the num ber of clusters. further According to the meth- nvestigation to identify dological requirements the optimal number of the disaggregation clusters is necessary and aggregation approach we most of time, is applica- discussed earlier. a weak tion dependent. In the movie recon ternatives is required to tem application, the Mae apply ordinal regression.81.1 of every user(MAEw),av- The users provided that eraged for all clusters and 是1.0 plotted against the num- with the performances ber of clusters, can pro on all four criteria for vide a rough estimation every movie in the refer of the optimal number of ence set. However. be 30 clusters. Even though this cause they expressed the Number of clusters plot will continuously de- global preference in a rease over the number of qualitative scale from one Figure 4. Average user mean absolute error(MAE)versus the clusters, Figure 4 shows to 13, we transformed all number of clusters. The curves slope decreases as the number that the curve's slope de- global preference vales into of clusters exceeds 20 creases as the number of weak preference order clusters exceeds 20. De- for every user. For example, a se- User Modeling Phase pending on the application, however, quence of numerical values such as The UTA, algorithm processed the any other meaningful metric or fea ri=[13, 12, 12, 6, 1] when trans- multicriteria data matrix to calculate ture, or even various combinations of formed into a ranking order will ap- significance weight vectors w for ev- them, can be proved insightful for de pear as ri=[12,2,3,4] ery user u. A matrix of 6, 078x 4 was ciding how many groups to keep Eventually, the multicriteria data formed, which included the weight vec- The final outcome of the third matrix, which acted as an input for tors of all users. All weights were nor- phase is a collection of disi the UTA algorithm, consisted of the malized to a range from zero to one. groups of users with similar prefer actual user ratings on all four crite- ences. These groups constitute the ria for the items belonging to the Clustering Phase user-profile clusters that the systems reference set as well as of a weak The global k-means algorithm di- final step exploits to provide item preference order for these items. vided the 6,078 weight vectors that recommendations. As we explained Table 2 gives an example of the input resulted from the user-modeling earlier, these groups can be updated multicriteria matrix hase into separate clusters. As we when required MARCHIAPRIL 2011 computer. org/intelligent
March/April 2011 www.computer.org/intelligent 73 criteria offer valuable knowledge for the design and effectiveness of recommender systems because it can be processed to build a user’s value system and decision policy. Thus, asking the user to provide this information can lead to a significant improvement in the system’s recommendation accuracy, which helps justify requesting a user’s additional time and effort . According to the methodological requirements of the disaggregationaggregation approach we discussed earlier, a weak preference order of the alternatives is required to apply ordinal regression. The users provided that information, together with the performances on all four criteria for every movie in the reference set. However, because they expressed the global preference in a qualitative scale from one to 13, we transformed all global preference vales into a weak preference order for every user. For example, a sequence of numerical values such as ri = [13, 12, 12, 6, 1] when transformed into a ranking order will appear as r′ i = [1, 2, 2, 3, 4]. Eventually, the multicriteria data matrix, which acted as an input for the UTA* algorithm, consisted of the actual user ratings on all four criteria for the items belonging to the reference set as well as of a weak preference order for these items. Table 2 gives an example of the input multicriteria matrix. User Modeling Phase The UTA* algorithm processed the multicriteria data matrix to calculate significance weight vectors wu for every user u. A matrix of 6,078 × 4 was formed, which included the weight vectors of all users. All weights were normalized to a range from zero to one. Clustering Phase The global k-means algorithm divided the 6,078 weight vectors that resulted from the user-modeling phase into separate clusters. As we already stated, global kmeans ensures optimality at each clustering step. This means that SSE will continuously decrease over the number of clusters. Figure 3 shows a plot of SSEs for different numbers of clusters. Although we can get a rough estimation of the clustering tendency from Figure 3, because the SSE will be constantly decreasing with the number of clusters, further investigation to identify the optimal number of clusters is necessary and, most of time, is application dependent. In the movie recommender system application, the MAE of every user (MAEu), averaged for all clusters and plotted against the number of clusters, can provide a rough estimation of the optimal number of clusters. Even though this plot will continuously decrease over the number of clusters, Figure 4 shows that the curve’s slope decreases as the number of clusters exceeds 20. Depending on the application, however, any other meaningful metric or feature, or even various combinations of them, can be proved insightful for deciding how many groups to keep. The final outcome of the third phase is a collection of disjoint groups of users with similar preferences. These groups constitute the user-profile clusters that the system’s final step exploits to provide item recommendations. As we explained earlier, these groups can be updated when required. Sum of squared error 8 6 4 2 0 Number of clusters 5 10 15 20 25 30 35 40 45 50 14 12 10 Figure 3. Sum of squared errors (SSE) versus the number of clusters. SSE continuously decreases over the number of clusters. Average cluster MAE 1.2 1.1 1.0 0.9 0.8 Number of clusters 5 10 15 20 25 30 1.5 1.4 1.3 Figure 4. Average user mean absolute error (MAE) versus the number of clusters. The curve’s slope decreases as the number of clusters exceeds 20. IS-26-02-Lakio.indd 73 18/03/11 3:31 PM