《电子商务 E-business》阅读文献：Recommender Systems Research A Connection-Centric Survey

团购合买资源类别：文库，文档格式：PDF，文档页数：38，文件大小：265.97KB

⊙2004 Kluwer Acad Publishers. Pri Recommender Systems Research: A Connection- Centric Survey AVERIO PERUGINI @cs.vt. edu MARCOS ANDRE GONCALVES EDWARD A FOX fox(cs. vt. edu Department of Computer Science, Virginia Tech, Blacksburg. VA 2406/ Received June 5, 2002: Revised November 24, 2003: Accepted December 3, 2003 bstract. Recommender systems attempt to reduce information overload and retain customers by selecting a subset of items from a universal set based on user preferences. While research in recommender systems grew out of information retrieval and filtering, the topic has steadily advanced into a legitimate and challenging research area nally been studied from a content-based filtering vs collaborative esign perspective. Recommendations, however, are not delivered within a vacuum, but rather cast within an formal community of users and social context. Therefore, ultimately all recommender systems make connections mong people and thus should be surveyed from such a perspective. This viewpoint is under-emphasized in the recommender systems literature. We therefore take a connection-oriented perspective toward recommender research. We posit that recommendation has an inherently social element and is ultimately intended to connect people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in extant data. Thus, recommender systems are characterized by how they model users to bring people together: explicitly or implicitly. Finally, user modeling and the connection-centric viewpoint se broadening and social issues-such as evaluation, targeting, and privacy and trust-which we also briefly Keywords: recommendation, recommender systems, small-worlds, social networks, user modeling What information consumes is rather obvious: it consumes the attention of its recipients Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might con- sume it Herbert A. simon 1. Introduction The advent of the www and concomitant increase in information available online has caused information overload and ignited research in recommender systems. By selecting a subset of items from a universal set based on user preferences, recommender systems attempt to reduce information overload and retain customers. Examples of systems include top-N lists, book(Mooney and Roy, 2000)and movie(Alspector et al., 1998)recommenders, advanced search engines( Chakrabarti et al., 1999), and intelligent avatars(Andre and Rist

Journal of Intelligent Information Systems, 23:2, 107–143, 2004 c 2004 Kluwer Academic Publishers. Printed in The United States. Recommender Systems Research: A Connection-Centric Survey SAVERIO PERUGINI sperugin@cs.vt.edu MARCOS ANDRE´ GONC¸ ALVES mgoncalv@cs.vt.edu EDWARD A. FOX fox@cs.vt.edu Department of Computer Science, Virginia Tech, Blacksburg, VA 24061 Received June 5, 2002; Revised November 24, 2003; Accepted December 3, 2003 Abstract. Recommender systems attempt to reduce information overload and retain customers by selecting a subset of items from a universal set based on user preferences. While research in recommender systems grew out of information retrieval and filtering, the topic has steadily advanced into a legitimate and challenging research area of its own. Recommender systems have traditionally been studied from a content-based filtering vs. collaborative design perspective. Recommendations, however, are not delivered within a vacuum, but rather cast within an informal community of users and social context. Therefore, ultimately all recommender systems make connections among people and thus should be surveyed from such a perspective. This viewpoint is under-emphasized in the recommender systems literature. We therefore take a connection-oriented perspective toward recommender systems research. We posit that recommendation has an inherently social element and is ultimately intended to connect people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in extant data. Thus, recommender systems are characterized by how they model users to bring people together: explicitly or implicitly. Finally, user modeling and the connection-centric viewpoint raise broadening and social issues—such as evaluation, targeting, and privacy and trust—which we also briefly address. Keywords: recommendation, recommender systems, small-worlds, social networks, user modeling “What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.” Herbert A. Simon 1. Introduction The advent of the WWW and concomitant increase in information available online has caused information overload and ignited research in recommender systems. By selecting a subset of items from a universal set based on user preferences, recommender systems attempt to reduce information overload and retain customers. Examples of systems include top-N lists, book (Mooney and Roy, 2000) and movie (Alspector et al., 1998) recommenders, advanced search engines (Chakrabarti et al., 1999), and intelligent avatars (Andr´e and Rist

PERUGINI, GONCALVES AND FOX 2002). The benefits of recommendation are most salient in voluminous and ephemeral domains(e. g, news) and include 'predictive utility(Konstan et al., 1997), the value of recommendation as advice given prior to investing time, energy, and in most cases, money in consuming a product Recommender systems harness techniques which develop a model of user preferences to predict future ratings of artifacts. The underlying algorithms to realize recommendation range from keyword matching(Housman and Kaskela, 1996)to sophist cated data mining of customer profiles(Adomavicius and Tuzhilin, 1999). Recommender systems are now widely believed to be critical to sustaining the Internet economy(Shapiro and Varian, 1999) Researchers have identified four main dimensions to help in the study of recommender systems: how the system is (i)modeled and designed (i.e, are recommendations content- based or collaborative? ) (ii) targeted (to an individual, group, or topic), (iii) built, and (iv) maintained(online vs. offline)(Mirza, 2001). Recommender systems are typically studied along the modeling dimension. The most popular(and over-emphasized) modeling dichotomy is content-based filtering(Mooney and Roy, 2000)vs. collaborative filtering Goldberg et al., 1992). Content-based filtering involves recommending items similar to those the user has liked in the past; e. g, "Since you liked The Little Lisper, you also might be interested in The Little Schemer. Collaborative filtering, on the other hand, involves rec- ommending items that users, whose tastes are similar to the user seeking recommendation have liked; e. g, "Linus and lucy like Sleepless in Seattle. Linus likes You ve Got Mail. Lucy also might like You've Got Mail. Terveen and Hill survey content-based and collaborative filtering systems in a human-computer interaction(HCI)context(Terveen and Hill, 2002). Others classify recommender systems from a business-oriented perspective (Schafer et al., 999), often based on how they are built. For instance, Schafer, Konstan, and riedl survey recommender systems in e-commerce based on interface, technology, and recommendation discovery (Schafer et al., 1999). These researchers also cast these aspects of recommenders in a two-dimensional space of recommendation lifetime(ephemeral vs. persistent) and level of automation(manual vs automatic)which is related to how they are maintained. Recommender systems, however, have an inherently social element and ultimately brin people together-a viewpoint under-emphasized in the literature-and therefore should be urveyed from this perspective. Accordingly, in this survey, we take a connection-centric approach toward studying recommender systems To help illustrate the elusive presence of a social connectivity element, consider that the process of recommendation in a"brick and mortar'setting is inherently dependent on knowledge of personal taste. For example, in a restaurant with a friend, the following dia- log might arise: The menu looks enticing. Since you are a returning patron, what do you recommend? 'Well, since you like spicy dishes, you may enjoy the chilli chicken curry. A mutually reinforcing dynamic ensues. The recommender's personal knowledge of her friend's interests are incorporated into the recommendation process. Conversely, after a recommendation is made, the recipients personal knowledge of the recommenders repu tation helps him evaluate the recommendation Recommender systems attempt to emulate and automate this naturally social process. This seemingly simple example speaks volumes about the process of making recommendations. Not only does a recommender system have an underlying social element, but its effectiveness is predicated upon its representation of

108 PERUGINI, GONC¸ ALVES AND FOX 2002). The benefits of recommendation are most salient in voluminous and ephemeral domains (e.g., news) and include ‘predictive utility’ (Konstan et al., 1997), the value of a recommendation as advice given prior to investing time, energy, and in most cases, money in consuming a product. Recommender systems harness techniques which develop a model of user preferences to predict future ratings of artifacts. The underlying algorithms to realize recommendation range from keyword matching (Housman and Kaskela, 1996) to sophisticated data mining of customer profiles (Adomavicius and Tuzhilin, 1999). Recommender systems are now widely believed to be critical to sustaining the Internet economy (Shapiro and Varian, 1999). Researchers have identified four main dimensions to help in the study of recommender systems: how the system is (i) modeled and designed (i.e., are recommendations contentbased or collaborative?), (ii) targeted (to an individual, group, or topic), (iii) built, and (iv) maintained (online vs. offline) (Mirza, 2001). Recommender systems are typically studied along the modeling dimension. The most popular (and over-emphasized) modeling dichotomy is content-based filtering (Mooney and Roy, 2000) vs. collaborative filtering (Goldberg et al., 1992). Content-based filtering involves recommending items similar to those the user has liked in the past; e.g., ‘Since you liked The Little Lisper, you also might be interested in The Little Schemer.’ Collaborative filtering, on the other hand, involves recommending items that users, whose tastes are similar to the user seeking recommendation, have liked; e.g., ‘Linus and Lucy like Sleepless in Seattle. Linus likes You’ve Got Mail. Lucy also might like You’ve Got Mail.’ Terveen and Hill survey content-based and collaborative filtering systems in a human-computer interaction (HCI) context (Terveen and Hill, 2002). Others classify recommender systems from a business-oriented perspective (Schafer et al., 1999), often based on how they are built. For instance, Schafer, Konstan, and Riedl survey recommender systems in e-commerce based on interface, technology, and recommendation discovery (Schafer et al., 1999). These researchers also cast these aspects of recommenders in a two-dimensional space of recommendation lifetime (ephemeral vs. persistent) and level of automation (manual vs. automatic) which is related to how they are maintained. Recommender systems, however, have an inherently social element and ultimately bring people together—a viewpoint under-emphasized in the literature—and therefore should be surveyed from this perspective. Accordingly, in this survey, we take a connection-centric approach toward studying recommender systems. To help illustrate the elusive presence of a social connectivity element, consider that the process of recommendation in a ‘brick and mortar’ setting is inherently dependent on knowledge of personal taste. For example, in a restaurant with a friend, the following dialog might arise: ‘The menu looks enticing. Since you are a returning patron, what do you recommend?’ ‘Well, since you like spicy dishes, you may enjoy the chilli chicken curry.’ A mutually reinforcing dynamic ensues. The recommender’s personal knowledge of her friend’s interests are incorporated into the recommendation process. Conversely, after a recommendation is made, the recipient’s personal knowledge of the recommender’s reputation helps him evaluate the recommendation. Recommender systems attempt to emulate and automate this naturally social process. This seemingly simple example speaks volumes about the process of making recommendations. Not only does a recommender system have an underlying social element, but its effectiveness is predicated upon its representation of

RECOMMENDER SYSTEMS RESEARCH the recipient. Therefore, recommender systems involve user modeling, which includes de- veloping a representation of user preferences and interests. User models can be constructed by explicitly soliciting feedback(e. g, asking the user to rate products or services)(Konstan et al., 1997)or gleaning implicit declarations of interest(e. g, through monitoring usage) (Terveen et al., 1997) User modeling is directed toward developing a basis to compute overlap, and ultimately conducted to make connections among people to drive recommendation. Thus, once enough users are engaged and modeled to sufficiently sustain a system, connections(rec- ommendations)can be made Recommendations, thus, are not delivered within a vacuum, but rather cast within an ' informal [community] of collaborators, colleagues, or friends (Kautz et al., 1997), known as a social network (Wasserman and Galaskiewicz, 1994 Explicit user modeling(and correlating the resulting ratings)then can be seen as directed toward forming such connected (community) graph components. Collecting implicit decla- ations of preference also can be viewed as directed toward inducing social networks. This is analogous to techniques to discover existing social networks from patterns embedded in interaction(transaction)data. Therefore an extension to traditional approaches to implicit user modeling, and an approach toward a basis to compute recommendations, entails directly exposing these self-organizing and self-maintaining social structures. Since social networks model social processes, these informal communities with shared interests are implicit in data generated automatically by electronic communications. This extension is corroborated by a recent trend toward exploring and exploiting connections of social processes in graph representations of self-organizing structures, such as the web, as a viable and increasingly opular way to satisfy information-seeking and recommendation-oriented goals(Broder, 2003: Kleinberg, 1999; Kleinberg and Lawrence, 2001). This less invasive approach not only supersedes the need to explicitly model users individually, but also results in more natural, reflective, and fertile organizations for recommendation. Exploration of identified existing social networks fosters the discovery of serendipitous connections(Schwartz and Wood, 1993), social referrals(Kautz et al., 1997), and cyber-communities(Kumar et al. 999), and hence offers many opportunities for recommendation. The use of social net- works has expanded to many diverse application domains such as movie recommendation (Mirza et al., 2003), digital libraries(Nevill-Manning, 2001), and community-based service location( Singh et al, 2001) This connection-oriented viewpoint and these two ways of realizing it provide the basis for this survey. We posit that recommendation has an inherently social element and is ultimatel concerned with connecting people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in existing data(see figure 1).We make connection-based distinctions. Systems are characterized by how they model users to bring people together: explicitly or implicitly. The goal then of a recommender system is to bring as many people together as possible, which also suggests a novel evaluation criterion(e.g, algorithm A connects x individuals while algorithm B connects y)(Mirza et al., 2003). Thus, while Amazon may make better book recommendations than Barnes and Noble, if they arrive at connected user components in the same manner, then in this survey they would be considered equivalent

RECOMMENDER SYSTEMS RESEARCH 109 the recipient. Therefore, recommender systems involve user modeling, which includes developing a representation of user preferences and interests. User models can be constructed by explicitly soliciting feedback (e.g., asking the user to rate products or services) (Konstan et al., 1997) or gleaning implicit declarations of interest (e.g., through monitoring usage) (Terveen et al., 1997). User modeling is directed toward developing a basis to compute overlap, and ultimately is conducted to make connections among people to drive recommendation. Thus, once enough users are engaged and modeled to sufficiently sustain a system, connections (recommendations) can be made. Recommendations, thus, are not delivered within a vacuum, but rather cast within an ‘informal [community] of collaborators, colleagues, or friends’ (Kautz et al., 1997), known as a social network (Wasserman and Galaskiewicz, 1994). Explicit user modeling (and correlating the resulting ratings) then can be seen as directed toward forming such connected (community) graph components. Collecting implicit declarations of preference also can be viewed as directed toward inducing social networks. This is analogous to techniques to discover existing social networks from patterns embedded in interaction (transaction) data. Therefore an extension to traditional approaches to implicit user modeling, and an approach toward a basis to compute recommendations, entails directly exposing these self-organizing and self-maintaining social structures. Since social networks model social processes, these informal communities with shared interests are implicit in data generated automatically by electronic communications. This extension is corroborated by a recent trend toward exploring and exploiting connections of social processes in graph representations of self-organizing structures, such as the web, as a viable and increasingly popular way to satisfy information-seeking and recommendation-oriented goals (Broder, 2003; Kleinberg, 1999; Kleinberg and Lawrence, 2001). This less invasive approach not only supersedes the need to explicitly model users individually, but also results in more natural, reflective, and fertile organizations for recommendation. Exploration of identified existing social networks fosters the discovery of serendipitous connections (Schwartz and Wood, 1993), social referrals (Kautz et al., 1997), and cyber-communities (Kumar et al., 1999), and hence offers many opportunities for recommendation. The use of social networks has expanded to many diverse application domains such as movie recommendation (Mirza et al., 2003), digital libraries (Nevill-Manning, 2001), and community-based service location (Singh et al., 2001). This connection-oriented viewpoint and these two ways of realizing it provide the basis for this survey. We posit that recommendation has an inherently social element and is ultimately concerned with connecting people either directly as a result of explicit user modeling or indirectly through the discovery of relationships implicit in existing data (see figure 1). We make connection-based distinctions. Systems are characterized by how they model users to bring people together: explicitly or implicitly. The goal then of a recommender system is to bring as many people together as possible, which also suggests a novel evaluation criterion (e.g., algorithm A connects x individuals while algorithm B connects y) (Mirza et al., 2003). Thus, while Amazon may make better book recommendations than Barnes and Noble, if they arrive at connected user components in the same manner, then in this survey they would be considered equivalent

RECOMMENDER SYSTEMS RESEARCH l11 focused on IR. In this era Salton and his students developed the vector-space model (Salton et al., 1975) and the Smart system(rocchio, 1971). Researchers modeled IR systems with large sparse(and anti-symmetric) term-document matrices which permit ted document similarity to be measured by the cosine of the angle between vectors in a multi-dimensional space. Precision and recall became the two quintessential IR met- rics(Salton and McGill, 1983). The emphasis of such research and systems was on sat- isfying short-term information-seeking goals by retrieving information deemed relevant to queries. IR research flourished in this period and many supportive techniques such as relevance feedback (Rocchio, 1971)were developed, demonstrating qualified As the end of the 1970s drew near. electronic information become more abundant The 1980s brought arapid proliferation of information due to desktop computers and applications such as word processors and spreadsheets. In addition, the introduction of e-mail into the mainstream further exasperated the copious amounts of text residing in computers(termed electronic junk by Denning(1982). The new found ease of information generation ignited a shift in is research initiatives, Researchers began to focus on removing irrelevant infor- mation rather than retrieving relevant information. Information categorization routing, and filtering became of immediate importance. This first shift spawned an information filtering thread In 1991 Bellcore hosted a workshop on information filtering (IF) which lead to the December 1992 Communications of the ACM special issue on the topic(Loeb and Terry, 992). In this issue Belkin and Croft compared and contrasted IF and IR(Belkin and Croft, 1992). While IR entails returning relevant information in response to short-term information-seeking goals via requests such as queries, information filtering involves re- moving persistent and irrelevant information over a long period of time. Information fil- tering systems model document features in user profiles(Mooney and roy, 2000), which replaced terms in a modeling matrix as a result of this shift(see Table 1). Information filter- ing later became known as content-based filtering to the recommender system communit and has been applied to recommend movies(Alspector et al., 1998)and books(Mooney and Roy, 2000). Content-based systems model content features of artifacts, rather than of documents, and recommend items by querying such product features against keywords or preferences supplied by the user(Krulwich and Burkley, 1996). SDI(Selective Dissem- ination of Information), one of the first information filtering systems, was based on key word matching(Housman and Kaskela, 1996). Content-based filtering is most effective in Table 1. Shifts in matrix models outlining the evolution of recommender systems from information retrieval Modeling matrix Information retrieval terms x documents Information filtering features x documents Content-based filtering features x artifacts people x documents Recommender systems people x artifacts

RECOMMENDER SYSTEMS RESEARCH 111 focused on IR. In this era Salton and his students developed the vector-space model (Salton et al., 1975) and the SMART system (Rocchio, 1971). Researchers modeled IR systems with large sparse (and anti-symmetric) term-document matrices which permitted document similarity to be measured by the cosine of the angle between vectors in a multi-dimensional space. Precision and recall became the two quintessential IR metrics (Salton and McGill, 1983). The emphasis of such research and systems was on satisfying short-term information-seeking goals by retrieving information deemed relevant to queries. IR research flourished in this period and many supportive techniques such as relevance feedback (Rocchio, 1971) were developed, demonstrating qualified success. As the end of the 1970s drew near, electronic information become more abundant. The 1980s brought a rapid proliferation of information due to desktop computers and applications such as word processors and spreadsheets. In addition, the introduction of e-mail into the mainstream further exasperated the copious amounts of text residing in computers (termed ‘electronic junk’ by Denning (1982)). The new found ease of information generation ignited a shift in IS research initiatives. Researchers began to focus on removing irrelevant information rather than retrieving relevant information. Information categorization, routing, and filtering became of immediate importance. This first shift spawned an information filtering thread. In 1991 Bellcore hosted a workshop on information filtering (IF) which lead to the December 1992 Communications of the ACM special issue on the topic (Loeb and Terry, 1992). In this issue Belkin and Croft compared and contrasted IF and IR (Belkin and Croft, 1992). While IR entails returning relevant information in response to short-term information-seeking goals via requests such as queries, information filtering involves removing persistent and irrelevant information over a long period of time. Information filtering systems model document features in user profiles (Mooney and Roy, 2000), which replaced terms in a modeling matrix as a result of this shift (see Table 1). Information filtering later became known as content-based filtering to the recommender system community and has been applied to recommend movies (Alspector et al., 1998) and books (Mooney and Roy, 2000). Content-based systems model content features of artifacts, rather than of documents, and recommend items by querying such product features against keywords or preferences supplied by the user (Krulwich and Burkley, 1996). SDI (Selective Dissemination of Information), one of the first information filtering systems, was based on keyword matching (Housman and Kaskela, 1996). Content-based filtering is most effective in Table 1. Shifts in matrix models outlining the evolution of recommender systems from information retrieval. Concept Modeling matrix Information retrieval terms × documents Information filtering features × documents Content-based filtering features × artifacts Collaborative filtering people × documents Recommender systems people × artifacts

PERUGINI, GONCALVES AND FOX text-intensive domains, which account for only a portion of the artifact landscape. Since we take a connection-oriented perspective toward recommendation, content-based models and methods do not find place in this survey In addition to identifying these differences, articles in this special issue also reported new research developments. Foltz and Dumais introduced latent semantic indexing as a viable technique to reduce dimensions in a term-document matrix(Foltz and Dumais, 1992). More importantly for recommender systems, Goldberg et al. coined the phrase collaborative filtering( Goldberg et al, 1992) while describing Tapestry, which later ame known as the first recommender system(Resnick and Varian, 1997). Collabora tive filtering, which can be defined as harnessing the activities of others in satisfying an information-seeking goal, introduced another shift in IS research. Collaborative filtering entails filtering items for a user that similar users filtered. Instead of computing artifact similarity(content-based filtering), collaborative approaches entail computing user simi larity. The most salient difference between these two approaches is that in content-based filtering users do not collaborate to improve the systems model of them, while in col- laborative approaches users leverage the collective experience of other users to enrich the systems model. Collaborative filtering is predicated upon persistent user models, such as profiles, which encapsulate preferences and features(e.g, married), rather than ephemer queries. This shift replaced features with representations of people(e.g, rating or profiles)to filter documents in a modeling matrix. While documents still constituted the other dimen- sion of the matrix, the word'document assumed a broader meaning after the birth of the web. In addition to its traditional interpretation, it also came to mean webpages and bookmarks(Balabanovic and Shoham, 1997: Rcuker and polano, 1997: Terveen et al 1997), as well as Usenet and e-mail messages( Goldberg et al., 1992; Konstan et al. Collaborative-filtering is effective since peoples tastes are typically not orthogonal How- ever, initially it was not embraced. Meanwhile, the advent of the web and its widespread use popularity, and acceptance, made reducing information overload a necessity. Of particular importance was social information filtering, a concept developed by Shardanand and Maes (1995). A few years later, in 1996, interest in collaborative filtering led to a workshop on the topic at the University of California, Berkeley. The results of this Berkeley workshop led to the March 1997 Communications of the ACM special issue on recommender systems, a phrase coined by Resnick and Varian in their article introducing the issue(Resnick and Varian, 1997) Resnick and Varian choose the phrase recommender systems'rather than'collabora- tive filtering because recommenders need not explicitly collaborate with recommendation recipients, if at all(helping to reconcile the differences between content-based and collab- orative approaches)(Resnick and Varian, 1997). Furthermore, recommendation refers to suggesting interesting artifacts in addition to solely filtering undesired objects(helping to reconcile the differences between IR and IF). Resnick and Varian define a recommender as a system which accepts user models as input, aggregates them, and returns recommen- dations to users. Two early collaborative-filtering recommender systems were Firefly and LikeMinds. Firefly evolved from Ringo(Shardanand and Maes, 1995)and HOMR (Helpful

112 PERUGINI, GONC¸ ALVES AND FOX text-intensive domains, which account for only a portion of the artifact landscape. Since we take a connection-oriented perspective toward recommendation, content-based models and methods do not find place in this survey. In addition to identifying these differences, articles in this special issue also reported new research developments. Foltz and Dumais introduced latent semantic indexing as a viable technique to reduce dimensions in a term-document matrix (Foltz and Dumais, 1992). More importantly for recommender systems, Goldberg et al. coined the phrase collaborative filtering (Goldberg et al., 1992) while describing Tapestry, which later became known as the first recommender system (Resnick and Varian, 1997). Collaborative filtering, which can be defined as harnessing the activities of others in satisfying an information-seeking goal, introduced another shift in IS research. Collaborative filtering entails filtering items for a user that similar users filtered. Instead of computing artifact similarity (content-based filtering), collaborative approaches entail computing user similarity. The most salient difference between these two approaches is that in content-based filtering users do not collaborate to improve the system’s model of them, while in collaborative approaches users leverage the collective experience of other users to enrich the system’s model. Collaborative filtering is predicated upon persistent user models, such as profiles, which encapsulate preferences and features (e.g., married), rather than ephemeral queries. This shift replaced features with representations of people (e.g., rating or profiles) to filter documents in a modeling matrix. While documents still constituted the other dimension of the matrix, the word ‘document’ assumed a broader meaning after the birth of the web. In addition to its traditional interpretation, it also came to mean webpages and bookmarks (Balabanovi´c and Shoham, 1997; Rcuker and Polano, 1997; Terveen et al., 1997), as well as Usenet and e-mail messages (Goldberg et al., 1992; Konstan et al., 1997). Collaborative-filtering is effective since people’s tastes are typically not orthogonal. However, initially it was not embraced. Meanwhile, the advent of the web and its widespread use, popularity, and acceptance, made reducing information overload a necessity. Of particular importance was social information filtering, a concept developed by Shardanand and Maes (1995). A few years later, in 1996, interest in collaborative filtering led to a workshop on the topic at the University of California, Berkeley. The results of this Berkeley workshop led to the March 1997 Communications of the ACM special issue on recommender systems, a phrase coined by Resnick and Varian in their article introducing the issue (Resnick and Varian, 1997). Resnick and Varian choose the phrase ‘recommender systems’ rather than ‘collaborative filtering’ because recommenders need not explicitly collaborate with recommendation recipients, if at all (helping to reconcile the differences between content-based and collaborative approaches) (Resnick and Varian, 1997). Furthermore, recommendation refers to suggesting interesting artifacts in addition to solely filtering undesired objects (helping to reconcile the differences between IR and IF). Resnick and Varian define a recommender as a system which accepts user models as input, aggregates them, and returns recommendations to users. Two early collaborative-filtering recommender systems were Firefly and LikeMinds. Firefly evolved from Ringo (Shardanand and Maes, 1995) and HOMR (Helpful

RECOMMENDER SYSTEMS RESEARCH 113 Online Music Recommendation Service)and allows a website to make intelligent book. movie, or music recommendations. Fireflys underlying algorithm(Shardanand, 1994)is nowusedtopowertherecommendationenginesofsitessuchasBarnesandnoble.com Collaborative approaches constitute the main thrust of current recommender systems research. Once users are modeled, the process of collaborative filtering can be viewed oper- ationally as a function which accepts a representation of users and universal set of artifacts as input and returns a recommended subset of those artifacts as output. More importantl for this survey, recommender systems also are intended to connect groups of individuals with similar interests and to leverage the collective experience rather than merely focus- ing on the information-seeking goal of a specific individual (as in a typical IR setting) In order to make connections, this function typically computes similarity(e. g, closeness, distance, or nearest neighbor). Making recommendations and thus connections then entails approximating this function. Approaches to this approximation that have evolved range from statistical models(e. g, correlating user ratings(Konstan et al., 1997)or reducing dimensions( Goldberg et al., 2000))to attribute-value based learning techniques(e.g, deci- sion trees, neural networks, and Bayesian classifiers)(Russell and Norvig, 1995) and have demonstrated qualified success(Breese et al., 1998). Ultimately these techniques can be viewed as ways to infer structure and induce connections in the modeling matrix spac This final shift replaced documents with artifacts in the modeling matrix. While the evolution of recommender systems research is characterized by the shifts in matrix models illustrated in Table l, the sparsity and anti-symmetric properties remained constant across each. As shown below, the web makes the matrix model symmetric. Sparsity is mostly attributable to the reluctance of users to rate artifacts Reluctance results from a lack of time, patience, or willingness to participate. Sometimes the benefits gained from providin constructive feedback are not apparent initially. Reluctance may be partially attributable to a heightened awareness of privacy when divulging personal information. Therefore, collaborative-based recommender systems must mediate an accuracy(of connection)vS. sparsity(of model) tradeoff. The following two sections are devoted to strategies for filling in cells of the initially sparse modeling matrix Since 1997 recommender systems research has advanced in many directions, such as reputation systems(Resnick et al., 2000)(e.g, eBay. com), and was placed in a larger context called'personalization'(Riecken, 2000). The functional-emphasis of current recommender systems makes them templates for personalization( Perugini and Ramakrishnan, 2003) 3. Creating connections: Explicit user modeling User modeling entails developing representations of user needs, interests, and taste and is a critical precursor to connecting people via recommendation. In addition to personal char acteristics, users can be modeled by their assessments of products in the form of ratings, which then become matrix entries. Sparse user feedback is the single greatest bottleneck of any collaborative-filtering algorithm: Collaborative filtering algorithms are not deemed universally acceptable precisely because users are not willing to invest much time or effort in rating the items(Aggarwal et aL., 1999). These problems are compounded in volumi- nous domains, where a large cumulative number of ratings is required to sufficiently cover

RECOMMENDER SYSTEMS RESEARCH 113 Online Music Recommendation Service) and allows a website to make intelligent book, movie, or music recommendations. Firefly’s underlying algorithm (Shardanand, 1994) is now used to power the recommendation engines of sites such as BarnesandNoble.com. Collaborative approaches constitute the main thrust of current recommender systems research. Once users are modeled, the process of collaborative filtering can be viewed operationally as a function which accepts a representation of users and universal set of artifacts as input and returns a recommended subset of those artifacts as output. More importantly for this survey, recommender systems also are intended to connect groups of individuals with similar interests and to leverage the collective experience rather than merely focusing on the information-seeking goal of a specific individual (as in a typical IR setting). In order to make connections, this function typically computes similarity (e.g., closeness, distance, or nearest neighbor). Making recommendations and thus connections then entails approximating this function. Approaches to this approximation that have evolved range from statistical models (e.g., correlating user ratings (Konstan et al., 1997) or reducing dimensions (Goldberg et al., 2000)) to attribute-value based learning techniques (e.g., decision trees, neural networks, and Bayesian classifiers) (Russell and Norvig, 1995) and have demonstrated qualified success (Breese et al., 1998). Ultimately these techniques can be viewed as ways to infer structure and induce connections in the modeling matrix space. This final shift replaced documents with artifacts in the modeling matrix. While the evolution of recommender systems research is characterized by the shifts in matrix models illustrated in Table 1, the sparsity and anti-symmetric properties remained constant across each. As shown below, the web makes the matrix model symmetric. Sparsity is mostly attributable to the reluctance of users to rate artifacts. Reluctance results from a lack of time, patience, or willingness to participate. Sometimes the benefits gained from providing constructive feedback are not apparent initially. Reluctance may be partially attributable to a heightened awareness of privacy when divulging personal information. Therefore, collaborative-based recommender systems must mediate an accuracy (of connection) vs. sparsity (of model) tradeoff. The following two sections are devoted to strategies for filling in cells of the initially sparse modeling matrix. Since 1997 recommender systems research has advanced in many directions, such as reputation systems (Resnick et al., 2000) (e.g., eBay.com), and was placed in a larger context called ‘personalization’ (Riecken, 2000). The functional-emphasis of current recommender systems makes them ‘templates for personalization’ (Perugini and Ramakrishnan, 2003). 3. Creating connections: Explicit user modeling User modeling entails developing representations of user needs, interests, and taste and is a critical precursor to connecting people via recommendation. In addition to personal characteristics, users can be modeled by their assessments of products in the form of ratings, which then become matrix entries. Sparse user feedback is the single greatest bottleneck of any collaborative-filtering algorithm: ‘Collaborative filtering algorithms are not deemed universally acceptable precisely because users are not willing to invest much time or effort in rating the items’ (Aggarwal et al., 1999). These problems are compounded in voluminous domains, where a large cumulative number of ratings is required to sufficiently cover

114 PERUGINI, GONCALVES AND FOX an entire set of items. Moreover, as the number of dimensions(e. g, people or products) grows larger, the number of multidimensional comparisons grows. In such situations tech- from data sing and OLAP(On-Line Analytical Processing )are appl Adomavicius and Tuzhilin, 2001). In large domains, users typically examine and evaluate only a small percentage of all items. Shallow analysis of content makes fostering connec tions difficult since opportunity for user overlap is limited. While in the initial stages of a system, this challenge has been echoed as the'cold-start'problem(Maltz and Ehrlich, 1995) as the day lifetime of a system. For example, a collaborative recommender has no platform to compute connections for a new user who has yet to rate products or a new item which has yet to be evaluated. Such problems in developing a basis for collaboration provide ample motivation for hybrid approaches which employ content-based filtering in these specific situations. Hy brid systems have shown improved performance over either single focus(pure)approach (Baudisch, 1999: Claypool et al., 1999; Soboroff and Nicholas, 1999). Systems must collect user data which affords the identification of differences, commonalities, and relationship mong people. In short, the goal is to add more and more information to transform a sparse matrix to a dense matrix with added structure Approaches to user modeling can be studied by how they harvest data(Resnick and arian,1997), either explicitly by asking users to submit feedback through surveys(Konstan et al., 1997)or inferring user interest implicit in(usage)data( Claypool et al., 2001; Tervee et al., 1997).Strategies for the former approach are showcased in this section, while those for the later are discussed in Section 4. The most important tradeoff to consider in user modeling is minimizing usereffort while maximizing the expressiveness of the representation (as well as privacy). In other words, there should be a small learning curve. Explicit approaches allow the user to retain control over the amount of personal information supplied to the system, but require an investment in time and effort to yield connections. other hand, minimize effort collect copious amounts of (sometimes noisy)data, and make the social element to recommender systems salient, but raise ethical issues. The secretive nature of these approaches often make users feel as if they are under a microscope. The user-modeling methodology for a collaborative-based system is illustrated in Table 2. In explicit user modeling, evaluations(Konstan et al., 1997) and profiles( Balabanovic and Shoham, 1997) are provided directly by users to declare preferences in response to elicitations for data such as surveys. Evaluations of recommended artifacts can be both Table 2. User modeling methodology of a collaborative-filtering recommender system. reluctance to rate items(compounded by volume concern of privacy) sparse modeling matrix(cold-start) explicit implicit user modeling(exploration) representation of user(ratings, profiles)as basis for connection deliver recommendations create connections(exploitation)

114 PERUGINI, GONC¸ ALVES AND FOX an entire set of items. Moreover, as the number of dimensions (e.g., people or products) grows larger, the number of multidimensional comparisons grows. In such situations techniques from data warehousing and OLAP (On-Line Analytical Processing) are applicable (Adomavicius and Tuzhilin, 2001). In large domains, users typically examine and evaluate only a small percentage of all items. Shallow analysis of content makes fostering connections difficult since opportunity for user overlap is limited. While in the initial stages of a system, this challenge has been echoed as the ‘cold-start’ problem (Maltz and Ehrlich, 1995) (also referred to as the ‘day-one’ or ‘early-rater’ problem), it is also ubiquitous during the lifetime of a system. For example, a collaborative recommender has no platform to compute connections for a new user who has yet to rate products or a new item which has yet to be evaluated. Such problems in developing a basis for collaboration provide ample motivation for hybrid approaches which employ content-based filtering in these specific situations. Hybrid systems have shown improved performance over either single focus (pure) approach (Baudisch, 1999; Claypool et al., 1999; Soboroff and Nicholas, 1999). Systems must collect user data which affords the identification of differences, commonalities, and relationships among people. In short, the goal is to add more and more information to transform a sparse matrix to a dense matrix with added structure. Approaches to user modeling can be studied by how they harvest data (Resnick and Varian, 1997), either explicitly by asking users to submit feedback through surveys (Konstan et al., 1997) or inferring user interest implicit in (usage) data (Claypool et al., 2001; Terveen et al., 1997). Strategies for the former approach are showcased in this section, while those for the later are discussed in Section 4. The most important tradeoff to consider in user modeling is minimizing user effort while maximizing the expressiveness of the representation (as well as privacy). In other words, there should be a small learning curve. Explicit approaches allow the user to retain control over the amount of personal information supplied to the system, but require an investment in time and effort to yield connections. Implicit approaches, on the other hand, minimize effort, collect copious amounts of (sometimes noisy) data, and make the social element to recommender systems salient, but raise ethical issues. The secretive nature of these approaches often make users feel as if they are under a microscope. The user-modeling methodology for a collaborative-based system is illustrated in Table 2. In explicit user modeling, evaluations (Konstan et al., 1997) and profiles (Balabanovi´c and Shoham, 1997) are provided directly by users to declare preferences in response to solicitations for data such as surveys. Evaluations of recommended artifacts can be both Table 2. User modeling methodology of a collaborative-filtering recommender system. user reluctance to rate items (compounded by volume & concern of privacy) ↓ sparse modeling matrix (cold-start) ↓ −→ explicit + implicit user modeling (exploration) ↓ representation of user (ratings, profiles) as basis for connection ↓ −→ −→ ←− deliver recommendations & create connections (exploitation) sustain (exploration vs. exploitation)

RECOMMENDER SYSTEMS RESEARCH 115 quantitative(e. g, ratings), akin to relevance feedback in IR and IF(Mostafa et al., 1997), and qualitative(e.g, lengthy reviews at Epinions. com). They also can be positive or negative. In a hand-crafted profile, a user states interests through items such as lists of keywords, pre-defined categories, or descriptions. The system then matches other users against this profile to recommend incoming artifacts. Systems which take such an approach to user modeling are SIFT (Yan and Garcia-Molina, 1999)and Tapestry( Goldberg et al., 1992) without crossing over to an implicit approach, researchers have identified strategies to deal with reluctance to make an explicit feedback requirement less noticeable and taxing (Konstan et al., 1997; Resnick and Varian, 1997). Possible approaches to motivate users to evaluate items are subscription services, incentives, such as transaction-based compensa- tions, and exclusions(Avery and Zeckhauser, 1997). Employing a pay-per-use model for recommender systems, where human experts rate items, is a viable, though less dynamic, option. While this approach connects users through experts and is thus collaborative, it deemphasizes the naturally social (and personal)element to recommenders. Default votes are another way to deal with sparse ratings( Breese et al., 1998). Developing and tightly integrating natural user interface (Un) mechanisms to solicit and capture feedback with existing interfaces for recommendation delivery may lead to less intrusive interaction and thus more cooperation and data( Grasso et al., 1999). A similar approach is to build recom- mendation into everyday systems, such as e-mail, news, and web clients, and services like collaborativespamdetectors(e.g,Cloudmark'sSpamnet,http://www.cloudmark.com).In addition to helping to collect more explicit ratings, building recommendation into com- mon Uls may help disseminate recommender systems to the masses. Requiring users to evaluate clusters of, rather than individual, items is another approach to mini effort. Rather than tackling sparsity from a user perspective in an explicit approach, it also can be approached from a system viewpoint. Filter-bots which automatically exam- ine and rate all products may occupy empty cells of a modeling matrix(Sarwar et al Lastly, a problem endemic to the subjective nature of explicit modeling techniques is that some users are more effusive in their ratings than others. Effusivity in ratings refers to cases of users who share similar preferences, but rate products on completely different scales Identifying variations in rating patterns is an approach to combat effusivity(Aggarwal et al 1999: Fruend et al., 1998). Other considerations. A variety of representations have been used to store user data (Bloedorn et al., 1996). The lack of standards to represent such information and its sources (e. g, logs) in a uniform manner make interoperability among recommender systems a challenge(Basu and Hirsh, 2001; Cingil et al., 2000). Cookies are mechanisms for capturing and storing userpreferences, often employed in e-commerce(Berghel, 2001). While cookies combat the stateless Http protocol like many of these techniques they raise security and privacy concerns because they are typically unknowingly enabled and as a result personal information is divulged. a challenge for any user modeling approach(explicit or implicit, for content-based or ollaborative recommendation) is the tradeoff between exploration(modeling the user) and exploitation(using the model to predict future ratings or make recommendations and

RECOMMENDER SYSTEMS RESEARCH 115 quantitative (e.g., ratings), akin to relevance feedback in IR and IF (Mostafa et al., 1997), and qualitative (e.g., lengthy reviews at Epinions.com). They also can be positive or negative. In a hand-crafted profile, a user states interests through items such as lists of keywords, pre-defined categories, or descriptions. The system then matches other users against this profile to recommend incoming artifacts. Systems which take such an approach to user modeling are SIFT (Yan and Garc´ıa-Molina, 1999) and Tapestry (Goldberg et al., 1992). Without crossing over to an implicit approach, researchers have identified strategies to deal with reluctance to make an explicit feedback requirement less noticeable and taxing (Konstan et al., 1997; Resnick and Varian, 1997). Possible approaches to motivate users to evaluate items are subscription services, incentives, such as transaction-based compensations, and exclusions (Avery and Zeckhauser, 1997). Employing a pay-per-use model for recommender systems, where human experts rate items, is a viable, though less dynamic, option. While this approach connects users through experts and is thus collaborative, it deemphasizes the naturally social (and personal) element to recommenders. Default votes are another way to deal with sparse ratings (Breese et al., 1998). Developing and tightly integrating natural user interface (UI) mechanisms to solicit and capture feedback with existing interfaces for recommendation delivery may lead to less intrusive interaction and thus more cooperation and data (Grasso et al., 1999). A similar approach is to build recommendation into everyday systems, such as e-mail, news, and web clients, and services like collaborative spam detectors (e.g., Cloudmark’s SpamNet, http://www.cloudmark.com). In addition to helping to collect more explicit ratings, building recommendation into common UIs may help disseminate recommender systems to the masses. Requiring users to evaluate clusters of, rather than individual, items is another approach to minimizing effort. Rather than tackling sparsity from a user perspective in an explicit approach, it also can be approached from a system viewpoint. Filter-bots which automatically examine and rate all products may occupy empty cells of a modeling matrix (Sarwar et al., 1997). Lastly, a problem endemic to the subjective nature of explicit modeling techniques is that some users are more effusive in their ratings than others. Effusivity in ratings refers to cases of users who share similar preferences, but rate products on completely different scales. Identifying variations in rating patterns is an approach to combat effusivity (Aggarwal et al., 1999; Fruend et al., 1998). Other considerations. A variety of representations have been used to store user data (Bloedorn et al., 1996). The lack of standards to represent such information and its sources (e.g., logs) in a uniform manner make interoperability among recommender systems a challenge (Basu and Hirsh, 2001; Cingil et al., 2000). Cookies are mechanisms for capturing and storing user preferences, often employed in e-commerce (Berghel, 2001). While cookies combat the stateless HTTP protocol, like many of these techniques, they raise security and privacy concerns because they are typically unknowingly enabled and as a result personal information is divulged. A challenge for any user modeling approach (explicit or implicit, for content-based or collaborative recommendation) is the tradeoff between exploration (modeling the user) and exploitation (using the model to predict future ratings or make recommendations and

116 PERUGINI, GONCALVES AND FOX connections), akin to that in reinforcement learning(Sutton and Barto, 1998). Studying the connections which can be made via recommendation and the resulting social networl induced in a random graph setting provides technical insight into this problem. Mirza et al. (2003)identify a'minimum rating constraint required to sustain a system and predict values for it based on various experimental rating datasets. Ultimately the approaches to user modeling illustrated in this and the following section are used to connect people. While a purely collaborative approach to recommendation is widely accepted and employed, it is riddled with endemic problems. User modeling must address more than just sparsity. For example, it is difficult to make connections to users with unusual or highly specific tastes. Furthermore, connecting users with similar interests who have rated different items(e.g, we both read world politics online, but he ranked BBC. com webpages, while I ranked CNN. com pages)is challenging. Over-specialization of evaluated artifacts, sometimes referred to as the banana problem(Burke, 1999), arises since frequently purchased items, such as bananas in a grocery market basket, will al ways be recommended. Conversely, some products are seldomly bought more than a few times in a lifetime(e. g, automobiles) and thus suffer from a low number of evaluations Over-specialization which is grounded in the exploration vs. exploitation dilemma can be addressed by occasionally forcing exploration. For instance, one can inject random- ess(e.g, crossover and mutation in a genetic algorithm or epsilon in a reinforcement learning algorithm) into a model. Recommended artifacts also can be partitioned into hot and cold sets, where the latter is intended to foster exploration and increase the(rating) coverage of items in the system(Aggarwal et al., 1999) 3.1. Review of some representative projects The following collaborative-based systems employ many of the explicit user modeling techniques showcased above and illustrate what can be achieved with representations of users. People are connected in the following systems through statistical( Goldberg et al., 2000: Konstan et al., 1997), agent-oriented (Balabanovic and Shoham, 1997), and graph theoretic(Aggarwal et al., 1999)approache Group Lens. GroupLens recommends Usenet news messages(Konstan et al., 1997). The system models users directly by explicitly eliciting and collecting ratings of messages through an independent newsreader. GroupLens is a project of the recommender systems research group at the University of Minnesota. Usenet news is a personal, voluminous, and ephemeral media(in comparison to movies)and thus an excellent candidate for collaborative filtering. A total of 250 people evaluated over 20,000 news articles(Resnick et al., 1994) GroupLens takes a statistical approach to making connections. The system predicts how a user seeking recommendation would rate an unrated article by computing a weighted average of the ratings of that message by users whose ratings were correlated with the user seeking recommendation Correlation is computed with Pearsons r coefficient. A research issue is deciding whether to provide personalized predictions(as GroupLens currently does) vS. personalized averages. Empirical research using Pearsons r correla- tion coefficient revealed that correlations between ratings and predictions is dramatically

116 PERUGINI, GONC¸ ALVES AND FOX connections), akin to that in reinforcement learning (Sutton and Barto, 1998). Studying the connections which can be made via recommendation and the resulting social network induced in a random graph setting provides technical insight into this problem. Mirza et al. (2003) identify a ‘minimum rating constraint’ required to sustain a system and predict values for it based on various experimental rating datasets. Ultimately the approaches to user modeling illustrated in this and the following section are used to connect people. While a purely collaborative approach to recommendation is widely accepted and employed, it is riddled with endemic problems. User modeling must address more than just sparsity. For example, it is difficult to make connections to users with unusual or highly specific tastes. Furthermore, connecting users with similar interests who have rated different items (e.g., ‘we both read world politics online, but he ranked BBC.com webpages, while I ranked CNN.com pages’) is challenging. Over-specialization of evaluated artifacts, sometimes referred to as the ‘banana’ problem (Burke, 1999), arises since frequently purchased items, such as bananas in a grocery market basket, will always be recommended. Conversely, some products are seldomly bought more than a few times in a lifetime (e.g., automobiles) and thus suffer from a low number of evaluations. Over-specialization which is grounded in the exploration vs. exploitation dilemma can be addressed by occasionally forcing exploration. For instance, one can inject randomness (e.g., crossover and mutation in a genetic algorithm or epsilon in a reinforcement learning algorithm) into a model. Recommended artifacts also can be partitioned into hot and cold sets, where the latter is intended to foster exploration and increase the (rating) coverage of items in the system (Aggarwal et al., 1999). 3.1. Review of some representative projects The following collaborative-based systems employ many of the explicit user modeling techniques showcased above and illustrate what can be achieved with representations of users. People are connected in the following systems through statistical (Goldberg et al., 2000; Konstan et al., 1997), agent-oriented (Balabanovi´c and Shoham, 1997), and graphtheoretic (Aggarwal et al., 1999) approaches. GroupLens. GroupLens recommends Usenet news messages (Konstan et al., 1997). The system models users directly by explicitly eliciting and collecting ratings of messages through an independent newsreader. GroupLens is a project of the recommender systems research group at the University of Minnesota. Usenet news is a personal, voluminous, and ephemeral media (in comparison to movies) and thus an excellent candidate for collaborative filtering. A total of 250 people evaluated over 20,000 news articles (Resnick et al., 1994). GroupLens takes a statistical approach to making connections. The system predicts how a user seeking recommendation would rate an unrated article by computing a weighted average of the ratings of that message by users whose ratings were correlated with the user seeking recommendation. Correlation is computed with Pearson’s r coefficient. A research issue is deciding whether to provide personalized predictions (as GroupLens currently does) vs. personalized averages. Empirical research using Pearson’s r correlation coefficient revealed that ‘correlations between ratings and predictions is dramatically

点击下载完整版文档（PDF格式）

共38页，可试读13页，点击继续阅读 ↓↓

点击下载（PDF格式）

浏览记录