Hybrid Recommender Systems: Survey and Experiments Robin burke User Modeling and User-Adapted Interaction: Nov 2002; 12, 4; ABI/INFORM Global pg331 User Modeling and User-Adapted C 2002 Khuwer Academic Publi Hybrid Recommender Systems: Survey and Experiments ROBIN BURKE Department of Information Systems and Decision Sciences. California State University, Fuller CA 92834, USA Received 23 January 2000; accepted in revised form 24 September 2001) Abstract. Recommender systems represent user preferences for the purpose of suggesting items to purchase or examine. They have become fundamental applications in electronic commerce and information access, providing suggestions that effectively prune large information spaces so that users are directed toward those items that best meet their needs and preferences. a variety of techniques have been proposed for performing recommendation, inchuding content-base collaborative, knowledge-based and other techniques. To improve performance, these methods have sometimes been combined in hybrid recommenders. This paper surveys the landscape actual and possible hybrid recommenders, and introduces a novel hybrid, Entree C, a system that combines knowledge-based recommendation and collaborative filtering to recommend restaurants. Further, we show that semantic ratings obtained from the knowledge-based part Key words: case-based reasoning, collaborative filtering, electronic commerc 1. Introduction Recommender systems were originally defined as ones in which 'people provide rec- ommendations as inputs, which the system then aggregates and directs to appro- priate recipients'(Resnick Varian, 1997). The term now has a broade connotation, describing any system that produces individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options. Such systems have an obvious appeal in an environment where the amount of on-line information vastly outstrips any individual's capability to survey it. Recommender systems are now an integral part of some e-commerce sites such as Amazon. com and CDNow(Schafer, Konstan riedl, 1999) It is the criteria of 'individualized'and interesting and useful that separate the recommender system from information retrieval systems or search engines. The semantics of a search engine are 'matching the system is supposed to return all those items that match the query ranked by degree of match. Techniques such I The managing editor for this paper was Ingrid Zukerman Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Hybrid Recommender Systems: Survey and Experiments Robin Burke User Modeling and User - Adapted Interaction; Nov 2002; 12, 4; ABI/INFORM Global pg. 331
ROBIN BURKE as relevance feedback enable a search engine to refine its representation of the users query, and represent a simple form of recommendation. The next-generation search engine Google blurs this distinction, incorporatingauthoritativeness'criteria into its ranking(defined recursively as the sum of the authoritativeness of pages linking to a given page) in order to return more useful results (Brin Page, 1998). One common thread in recommender systems research is the need to combine recommendation techniques to achieve peak performance. All of the known rec ommendation techniques have strengths and weaknesses, and many researchers have chosen to combine techniques in different ways. This article surveys the different ecommendation techniques being researched- analyzing them in terms of the data that supports the recommendations and the algorithms that operate on that data and examines the range of hybridization techniques that have been proposed. This analysis points to a number of possible hy brids that have yet to be explored. Finally, we discuss how adding a hybrid with colla borative filtering improved the perform ance of our knowledge-based recommender system Entree. In addition, we show that semantic ratings made available by the knowledge-based portion of the system provide an additional boost to the hy brids performance 1.1. RECOMMENDATION TECHNIQUES Recommendation techniques have a number of possible classifications(Resnick Varian, 1997; Schafer, Konstan riedl, 1999; Terveen Hill, 2001). Of interest in this discussion is not the type of interface or the properties of the users interaction with the recommender, but rather the sources of data on which recommendation is based and the use to which that data is put. Specifically, recommender systems have (i background data, the information that the system has before the recommendation process begins, (ii) input data, the information that user must communicate to the system in order to generate a recommendation, and (iii) an algorithm that combines background and input data to arrive at its suggestions On this basis, we can dis- tinguish five different recommendation techniques as shown in Table I. Assume that I is the set of items over which recommendations might be made, U is the set of users whose preferences are known, u is the user for whom recommendations need to be generated, and i is some item for which we would like to predict u's preference Collaborative recommendation is probably the most familiar, most widely implemented and most mature of the technologies. Collaborative recommender sys- tems aggregate ratings or recommendations of objects, recognize commonalities between users on the basis of their ratings, and generate new recommendations based on inter-user comparisons. A typical user profile in a collaborative system consists of vector of items and their ratings, continuously augmented as the user interacts with the system over time. Some systems used time-based discounting of ratings to ccount for drift in user interests(Billsus Pazzani, 2000; Schwab et al., 2001) In some cases, ratings may be binary(like/dislike)or real-valued indicating degree UrlhtTp://www.google.com Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
HYBRID RECOMMENDER SYSTEMS SURVEY AND EXPERIMENTS Table I. Recommendation technique Ratings from u of Identifv users in U similar to u, and extrapolate from their ratings of i. Content- Features of items in I u's ratings of items in I Generate a classifier based hat fits us rating behavior and use it on i Demographic Demographic Demographic Identify users that information about nformation about u graphically U and their ratings similar to u. and extrapolate from their ratings of i Utility-based Features of items in L. A utility function over Apply the function to items in I that describes the items and determine s rank Features of items in L. A description of Infer a match between of preference. Some of the most important systems using this technique are (Resnick et al., 1994), Ringo/ Firefly (Shardanand Maes, 1995), Tapestry( Goldberg et al., 1992)and Recommender(hill et al, 1995) These systems can be either memory-based, comparing users against each other other mea model-based. in which a model is derived from the historical rating data and used to make predictions(breese et al., 1998). Model-based recommenders have used a variety of learning techniques including neural networks (Jennings Higuchi, 1993), latent semantic indexing ( Foltz, 1990), and Bayesian networks( Condliff et al., 1999) The greatest strength of collaborative techniques is that they are completely inde pendent of any machine-readable representation of the objects being recommended, and work well for complex objects such as music and movies where variations in taste are responsible for much of the variation in preferences. Schafer, Konstan and Riedl (1999)call this ' people-to-people correlation Demographic recommender systems aim to categorize the user based on pe attributes and make recommendations based on demographic classes. An early example of this kind of system was Grundy(rich, 1979) that recommended books based on personal information gathered through an interactive dialogue. The users responses were matched against a library of manually assembled user stereotypes Some more recent recommender systems have also taken this approach. Krulwich (1997), for example, uses demographic groups from marketing research to suggest a range of products and services. a short survey is used to gather the data for user categorization. In other systems, machine learning is used to arrive at a classifier based on demographic data(Pazzani, 1999). The representation of demographic Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
ROBIN BURKE nformation in a user model can vary greatly. Rich's system used hand-crafted attributes with numeric confidence values. Pazzani's model uses Winnow to extract features from users' home pages that are predictive of liking certain restaurants Demographic techniques form people-to-people' correlations like collaborative ones, but use different data. The benefit of a demographic approach is that it may not require a history of user ratings of the type needed by collaborative and content-based techniques Content-based recommendation is an outgrowth and continuation of information filtering research(Belkin Croft, 1992). In a content-based system, the objects of interest are defined by their associated features. For example, text recommendation systems like the newsgroup filtering system News Weeder (Lang, 1995)uses the words of their texts as features. A content-based recommender learns a profile the user's interests based on the features present in objects the user has rated Schafer, Konstan and Riedl call this item-to-item correlation. The type of user profile derived by a content-based recommender depends on the learning method employed. Decision trees, neural nets, and vector-based representations have all been used. As in the collaborative case, content-based user profiles are long-term models and updated as more evidence about user preferences is observed Utility-based and knowledge-based recommenders do not attempt to build long-term generalizations about their users, but rather base their advice on an evalu ation of the match between a user's need and the set of options available Utility-based recommenders make suggestions based on a computation of the utility of each object for the user. Of course, the central problem is how to create a utility function for each user. Tete-a-Tete and the e-commerce site Persona logic each have different techniques for arriving at a user-specific utility function and applying it to the objects under consideration( Guttman, 1998). The user profile therefore is the utility function that the system has derived for the user, and the system employs onstraint satisfaction techniques to locate the best match. The benefit of utility-based recommendation is that it can factor non-product attributes, such as vendor reliability and product availability, into the utility computation, making it possible for example to trade off price against delivery schedule for a user who has an immediate need Knowledge-based recommendation attempts to suggest objects based on inferences about a user's needs and preferences. In some sense, all recommendation techniques could be described as doing some kind of inference. Knowledge-based approaches are distinguished in that they have functional knowledge: they have knowledge abor how a particular item meets a particular user need, and can therefore reason about the relationship between a need and a possible recommendation. The user profil can be any knowledge structure that supports this inference. In the simplest case, in Google, it may simply be the query that the user has formulated. In others, Schoorexampleseethecollegeguidesavailableathttp://www.peronalogic.aolcom/go/grad- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
HYBRID RECOMMENDER SYSTEMS SURVEY AND EXPERIMENTS 335 it may be a more detailed representation of the user's needs(Towle Quinn, 2000) The Entree system(described below )and several other recent systems(for example, Schmitt Bergmann, 1999) employ techniques from case-based reasoning for knowledge-based recommendation. Schafer, Konstan and Riedl call knowledge- based recommendation the 'Editor's choice method The knowledge used by a knowledge-based recommender can also take many forms. Google uses information about the links between web pages to infer popularity and authoritative value ( Brin Page, 1998). Entree uses knowledge of cuisines to infer similarity between restaurants. Utility-based approaches calcu late a utility value for objects to be recommended, and in principle, such calculations could be based on functional knowledge. However, existing systems do not use such inference, requiring users to do their own mapping between their needs and the fea ures of products, either in the form of preference functions for each feature in the case of tete-a- tete or answers to a detailed questionnaire in the case of Persona logic 2. Comparing recommendation techniques All recommendation techniques have strengths and weaknesses discussed below and summarized in Table II. Perhaps the best known is the ramp-up'problem(Konstan et al., 1998). This term actually refers to two distinct but related problems New User: Because recommendations follow from a comparison between the target ser and other users based solely on the accumulation of ratings, a user with few ratings becomes difficult to categorize. Table IL. TradofIs between recommendation techniques A. Can identify cross-genre I New user ramp-up probler filtering J. New item ramp-up problem B. Domain knowledge not K " Gray sheep problem L. Quality d C. Adaptive: qua over time M. Stability vs, plasticity problem D. Implicit feedback sufficient Content-based(CN) B, C, D I, L, M Demographic(DM) A, B, C L, K, L M N. Must gather demograp information O. User must input ut P. Suggestion ability static G. Can include non-product features Knowledge-based E F G ap Q Knowledge engineering required user needs to products Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
336 ROBIN BURKE New Item: Similarly, a new item that has not had many ratings also cannot be easily recommended: the 'new item'problem. This problem shows up in domains such as news articles where there is a constant stream of new items and each user only rates a few. It is also known as the early rater' problem, since the first person to rate an item gets little benefit from doing so: such early ratings do not improve a user's ability to match against others(Avery Zeckhauser, 1997). This makes it necessary for recommender systems to provide other incentives to encourage users to provide ratings Collaborative recommender systems depend on overlap in ratings across users and have difficulty when the space of ratings is sparse: few users have rated the same tems. The sparsity problem is somewhat reduced in model-based approaches, such as singular value decomposition(Strang, 1988), which can reduce the dimensionality of the space in which comparison takes place(Foltz, 1990; Rosenstein& Lochbaum 2000). Still sparsity is a significant problem in domains such as news filtering, since there are many items available and, unless the user base is very large, the odds that another user will share a large number of rated items is small These three problems suggest that pure collaborative techniques are best suited to oblems where the density of user interest is relatively high across a small and stati universe of items. If the set of items changes too rapidly, old ratings will be of little value to new users who will not be able to have their ratings compared to those of the existing users. If the set of items is large and user interest thinly spread, then the probability of overlap with other users will be small Collaborative recommenders work best for a user who fits into a niche with many neighbors of similar taste. The technique does not work well for so-called 'gray sheep'(Claypool et al., 1999), who fall on a border between existing cliques of users This is also a problem for demographic systems that attempt to categorize users on personal characteristics. On the other hand, demographic recommenders do not have the 'new user'problem, because they do not require a list of ratings from user.Instead they have the problem of gathering the requisite demographic information. With sensitivity to on-line privacy increasing, especially in electronic commerce contexts (USITIC, 1997), demographic recommenders are likely to remain rare: the data most predictive of user preference is likely to be information that users are reluctant to disclose Content-based techniques also have a start-up problem in that they must accumu- late enough ratings to build a reliable classifier. Relative to collaborative filtering, content-based techniques also have the problem that they are limited by the features that are explicitly associated with the objects that they recommend. For example, content-based movie recommendation can only be based on written materials about a movie: actors' names, plot summaries, etc. because the movie itself is opaque to the stem.This puts these techniques at the mercy of the descriptive data available Collaborative systems rely only on user ratings and can be used to recommend items without any descriptive data. Even in the presence of descriptive data, some exper- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
HYBRID RECOMMENDER SYSTEMS SURVEY AND EXPERIMENTS 337 iments have found that colla borative recommender systems can be more accurate than content-based ones(Alspector et al., 1997) The great power of the collaborative approach relative to content-based ones is its cross-genre or 'outside the box'recommendation ability It may be that listeners who enjoy free jazz also enjoy avant-garde classical music, but a content-based recommender trained on the preferences of a free jazz aficionado would not be able to suggest items in the classical realm since none of the features(performers, instruments, repertoire) associated with items in the different categories would be shared Only by looking outside the preferences of the individual can such sugges- tions be mad Both content-based and collaborative techniques suffer from the'portfolio effect An ideal recommender would not suggest a stock that the user already owns or a movie she has already seen. The problem becomes quite tricky in domains such as news filtering, since stories that look quite similar to those already read may in fact present some new facts or new perspectives that would be valuable to the user.At the same time, many different presentations of the same wire-service story from different newspapers would not be useful. The Daily Learner system (Billsus Pazzani, 2000)uses an upper bound of similarity in its content-based recom mender to filter out news items too similar to those already seen by the use Utility-based and knowledge-based recommenders do not have ramp-up or spar- sity problems, since they do not base their recommendations on accumulated stat istical evidence. Utility-based techniques require that the system build a complete utility function across all features of the objects under consideration. One benefit of this approach is that it can incorporate many different factors that contribute to the value of a product, such as delivery schedule, warranty terms r conceivably the user's existing portfolio, rather than just product-specif features. In addition, these non-product features may have extremely idiosyncratic utility: how soon something can be delivered may matter very much to a user facing a deadline. A utility-based framework thereby lets the user express all of the con- siderations that need to go into a recommendation. For this reason, Guttman(1999) describes Tete-a-Tete as ' product and merchant brokering 'system rather than recommender system. However, under the definition given above, Tete-a-Tete does fit since its main output is a recommendation(a top-ranked item) that is generated on a personalized basis The flexibility of utility-based systems is also to some degree a failing. The user must construct a complete preference function, and must therefore weigh the sig- nificance of each possible feature. Often this creates a significant burden of interaction. Tete-a-Tete uses a small number of stereotype'preference functions to get the user started, but ultimately the user needs to look at, weigh, and select preference function for each fea s an item o be feasible for items with only a few characteristics, such as price, quality and delivery date, but not for more complex and subjective domains like movies or news articles. Persona Logic does not require the user to input a utility function, but Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
ROBIN BURKE instead derives the function through an interactive questionnaire. while the com plete explicit utility function might be a boon to some users, for example, technical Isers with specific purchasing requirements, it is likely to overwhelm a more casual user with a less-detailed knowledge. Large moves in the product space, for example, from'sports carsto 'family cars'require a complete re-tooling of the preference function, including everything from interior space to fuel economy. This makes a utility-based system less appropriate for the casual browser nowledge-based recommender systems are prone to the drawback of al knowledge-based systems: the need for knowledge acquisition. There are three types of knowledge that are involved in such a system Catalog knowledge: Knowledge about the objects being recommended and their features. For example, the Entree recommender should know that 'Thai cuisine is a kind of asian cuis unctional knowledge: The system must be able to map between the user's needs and the object that might satisfy those needs. For example, Entree knows that a need for a romantic dinner spot could be met by a restaurant that is quiet with an ocean vIew User knowledge: To provide good recommendations, the system must have some knowledge about the user. This might take the form of general demographic in formation or specific information about the need for which a recommendation is sought. Of these knowledge types, the last is the most challenging, as it is, in the worst case, an instance of the general user-modeling problem(Towle Quinn, 2000) Despite this drawback, knowledge-based recommendation has some beneficial characteristics. It is appropriate for casual exploration, because it demands less of the user than utility-based recommendation. It does not involve a start-up period during which its suggestions are low quality. a knowledge-based recommender can not'discover'user niches, the way collaborative systems can. On the other hand, it can make recommendations as wide-ranging as its knowledge base allows Table II summarizes the five recommendation techniques that we have discussed here, pointing out the pros and cons of each. Collaborative and demographic tech niques have the unique capacity to identify cross-genre niches and can entice users to jump outside of the familiar, Knowledge-based techniques can do the same but only if such associations have been identified ahead of time by the knowledge All of the learning-based techniques(collaborative, content-based and demo graphic) suffer from the ramp-up problem in one form or another. The converse of this problem is the stability vs. plasticity problem for such learners. Once a user's profile has been established in the system, it is difficult to change one's preferences etarian will cor dations from a content-based or collaborative recommender for some time. until Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
HYBRID RECOMMENDER SYSTEMS, SURVEY AND EXPERIMENTS newer ratings have the chance to tip the scales. Many adaptive systems include some sort of temporal discount to cause older ratings to have less influence, but they do so at the risk of losing information about interests that are long-term but sporadically exercised(Billsus Pazzani, 2000; Schwab et al., 2001). For example, a user might like to read about major earthquakes when they happen, but such occurrences are sufficiently rare that the ratings associated with last year's earthquake are gone by the time the next big one hits. Knowledge- and utility-based recommenders respond to the user's immediate need and do not need any kind of retraining when The ramp-up problem has the side-effect of excluding casual users from receiving the full benefits of colla borative and content-based recommendation. It is possible to do simple market-basket recommendation with minimal user input Amazon.com's'peoplewhoboughtXalsoboughtYbutthismechanismhas few of the advantages commonly associated with the collaborative filtering concept The learning-based technologies work best for dedicated users who are willing to invest some time making their preferences known to the system. Utility-and knowledge-based systems have fewer problems in this regard because they do not rely on having historical data about a user's preferences. Utility-based systems may present difficulties for casual users who might be unwilling to tailor a utility function simply to browse a catalog 3. Hybrid recommender systems Hybrid recommender systems combine two or more recommendation techniques to gain better performance with fewer of the drawbacks of any individual one. Most commonly, collaborative filtering is combined with some other technique in an met pt to avoid the ramp-up problem. Table IIT shows some of the combination 3.1. WEIGHTED A weighted hybrid recommender is one in which the score of a recommended item is computed from the results of all of the available recommendation techniques present in the system. For example, the simplest combined hybrid would be a linear com- bination of recommendation scores. The P-Tango system(Claypool et al., 1999) uses such a hybrid. It initially gives collaborative and content-based recommenders equal weight, but gradually adjusts the weighting as predictions about user ratings are confirmed or disconfirmed. Pazzani,s combination hybrid does not use numeric scores, but rather treats the output of each recommender(collaborative, con- tent-based and demographic)as a set of votes, which are then combined in a con sensus scheme(Pazzani, 1999 The benefit of a weighted hybrid is that all of the system's capabilities are brought to bear on the recommendation process in a straightforward way and it is easy to Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
ROBIN BURKE Table Ill. Hybridization methods Hybridization method Description The scores(or votes) of several recommendation techniques are combined together to produce a single recommendation Switching The system switches between recommendation techniques depending the current situation Mixed Recommendations from several different recommenders are presented at the same time Feature combination Features from different recommendation data sources are thrown together into a single recommendation algorithm Cascade One recommender refines the recommendations given by another. Feature augmentation Output from one technique is used as an input feature to another. Meta-level The model learned by one recommender is used as input to another perform post-hoc credit assignment and adjust the hybrid accordingly. However, the implicit assumption in this technique is that the relative value of the different tech- niques is more or less uniform across the space of possible items From the discussion above, we know that this is not always so: a collaborative recommender will be weaker for those items with a small number of raters 3.2. SWITCHING A switching hybrid builds in item-level sensitivity to the hy bridization strategy: the stem uses some criterion to switch between recommendation techniques. The Daily Learner system uses a content/ collaborative hybrid in which a content-based recommendation method is employed first. If the content-based system cannot make a recommendation with sufficient confidence. then a collabora tive recommendation is attempted. This switching hybrid does not completely avoid the ramp-up pro- blem, since both the collaborative and the content-based systems have the ' new user problem. However, Daily Learner's content-based technique is nearest-neighbor which does not require a large number of examples for accurate classification What the collaborative technique provides in a switching hybrid is the ability to cross genres, to come up with recommendations that are not close in a semantic way to the items previous rated highly, but are still relevant. For example, in the case of Daily Learner, a user who is interested in the Microsoft anti-trust trial might also be interested in the aol/Time Warner merger. Content matching would not be likely to recommend the merger stories, but other users with an interest in corporate power in the high-tech industry may be rating both sets of stories highly, enabling the system to make the recommendation colla boratively Daily Learner's hybrid has a 'fallback'character- the short-term model is always used first and the other technique only comes into play when that technique fails Tran and Cohen(1999)proposed a more straightforward switching hybrid. In their and one long-term, and the fallback strategy is short-term/collaborative/long-ter e short-term 3 Actually Billsus' system has two content-based recommendation algorithms, or Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission