a Taxonomy of collaborative-Based Recommender Systems Fabian P. lousame and eduardo sanchez 1 ntroduction The explosive growth in the amount of information available in the www and e emergence of e-commerce in recent years has demanded new ways to deliver personalized content. Recommender systems 51] have emerged in this context as a solution based on collective intelligence to either predict whether a particular user will like a particular item or identify the collection of items that will be of interest to a certain user. Recommender systems have an excellent ability to characterize and recommend items within huge collections of data, what makes them a computerized alternative to human recommendations. Since useful per- sonalized recommendations can add value to the user experience, some of the largest e-commerce web sites include recommender engines. Three well known examples are Amazon. com [1], LastFM [4 and Netfix 6 Although the first studies can be traced back to cognitive science, approxima tion theory and information retrieval among other fields, recommender systems became an independent research area in the mid-1990s when Resnick et al. 50 Hill et al.29 and Shardanand et al. 56 proposed recommendation techniques explicitly based on user rating information. Since then, numerous approaches have been developed that use content or historical information: user-item in teractions, explicit ratings, or web logs, among others. Nowadays, recommender systems are typically classified into the following categories content-based. if the user is recommended items that are content -similar to the items the user already liked collaborative, if the user is recommended items that people with similar tastes and preferences liked in the past hybrid, if the user is recommended items based on a combination of both nd content-based methods This chapter presents a study focused on recommender systems based on ollaborative filtering, the most successful recommendation technique to date The chapter provides the reader an overview of recommender systems based on collaborative filtering, contributes with a general taxonomy to classify the algo- rithms and approaches attending to a set of relevant features, and finally provides G. Castellano, L C. Jain, A.M. Fanelli(Eds ) Web Person in Intel. Environ., SCI 229, pp. 81-117 C Springer-Verlag Berlin Heidelberg 2009
5 A Taxonomy of Collaborative-Based Recommender Systems Fabi´an P. Lousame and Eduardo S´anchez 1 Introduction The explosive growth in the amount of information available in the WWW and the emergence of e-commerce in recent years has demanded new ways to deliver personalized content. Recommender systems [51] have emerged in this context as a solution based on collective intelligence to either predict whether a particular user will like a particular item or identify the collection of items that will be of interest to a certain user. Recommender systems have an excellent ability to characterize and recommend items within huge collections of data, what makes them a computerized alternative to human recommendations. Since useful personalized recommendations can add value to the user experience, some of the largest e-commerce web sites include recommender engines. Three well known examples are Amazon.com [1], LastFM [4] and Netflix [6]. Although the first studies can be traced back to cognitive science, approximation theory and information retrieval among other fields, recommender systems became an independent research area in the mid-1990s when Resnick et al. [50], Hill et al. [29] and Shardanand et al. [56] proposed recommendation techniques explicitly based on user rating information. Since then, numerous approaches have been developed that use content or historical information: user-item interactions, explicit ratings, or web logs, among others. Nowadays, recommender systems are typically classified into the following categories: • content-based, if the user is recommended items that are content-similar to the items the user already liked; • collaborative, if the user is recommended items that people with similar tastes and preferences liked in the past; • hybrid, if the user is recommended items based on a combination of both collaborative and content-based methods. This chapter presents a study focused on recommender systems based on collaborative filtering, the most successful recommendation technique to date. The chapter provides the reader an overview of recommender systems based on collaborative filtering, contributes with a general taxonomy to classify the algorithms and approaches attending to a set of relevant features, and finally provides G. Castellano, L.C. Jain, A.M. Fanelli (Eds.): Web Person. in Intel. Environ., SCI 229, pp. 81–117. springerlink.com c Springer-Verlag Berlin Heidelberg 2009
FP Lousame and e. sanchez some guidelines to decide which algorithm best fits on a given recommendation problem or domain. 2 Recommending Based on Collaborative Filtering The term Collaborative Filtering(CF) was first introduced by Goldberg et al 23. They presented Tapestry, an experimental mail system that combined both content-based filtering and collaborative annotations. Although the system wa enriched with collaborative information, users were required to write complex queries. The first system that automated recommendations was the groupLens system 50, 37 which helped users find relevant netnews from a huge stream of articles using ratings given by other similar users. Since then, many relevant research projects have been developed(Ringo 56, Video Recommender 29 Movielens [19, 5, Jester [24)and the results have positioned the CF techniques as the most successful ones to build recommender engines. Popular e-commerce systems, such as Amazon [1, CDNow 3 or LastFM [4, are taking advantage of the CF relies on the assumption that finding similar users to a new one and ex- mining their usage patterns leads to useful recommendations for the new user Users usually prefer items that like-minded users prefer, or even that dissimilar users don't prefer. This technology does not rely on the content descriptions of e items, but depends on preferences expressed by a set of users. These prefer ences can either be expressed explicitly by numeric ratings or can be indicated implicitly by user behaviors, such as clicking on a hyperlink, purchasing a book or reading a particular news article. CF requires no domain knowledge and offers the potential to uncover patterns that would be difficult or impossible to detect using content-based techniques. Besides that, collaborative filtering has proved its ability to identify the most appropriate item for each user, and the quality of recommendations is improved over time as long as the user database gets larger Two different approaches have been explored for building Pure CF recom- menders. The first approach, referred to as memory-based 56, 37, 15, 50, 27, 54 essentially makes rating predictions based on the entire collection of rated items Items frequently selected by users of the same group can be used to form the basis to build a list or recommended items. They produce high-quality recom- mendations but suffer serious scalability problems as the number of users and items grow. The other approach, known as model-based 56, 14, 15, 9, analyzes historical interaction information to build a model of the relations between differ ent items/users which is intended to find the recommended items. Model-based schemes produce faster recommendations than memory-based do, but requires a significant amount of time to build the models and leads to lower qualit Definitions and notati In the context of recommender systems, a dataset is defined as the collection of all transactions about the items that have been selected by a collection of users
82 F.P. Lousame and E. S´anchez some guidelines to decide which algorithm best fits on a given recommendation problem or domain. 2 Recommending Based on Collaborative Filtering The term Collaborative Filtering (CF) was first introduced by Goldberg et al. [23]. They presented Tapestry, an experimental mail system that combined both content-based filtering and collaborative annotations. Although the system was enriched with collaborative information, users were required to write complex queries. The first system that automated recommendations was the GroupLens system [50, 37] which helped users find relevant netnews from a huge stream of articles using ratings given by other similar users. Since then, many relevant research projects have been developed (Ringo [56], Video Recommender [29], Movielens [19, 5], Jester [24]) and the results have positioned the CF techniques as the most successful ones to build recommender engines. Popular e-commerce systems, such as Amazon [1], CDNow [3] or LastFM [4], are taking advantage of these engines. CF relies on the assumption that finding similar users to a new one and examining their usage patterns leads to useful recommendations for the new user. Users usually prefer items that like-minded users prefer, or even that dissimilar users don’t prefer. This technology does not rely on the content descriptions of the items, but depends on preferences expressed by a set of users. These preferences can either be expressed explicitly by numeric ratings or can be indicated implicitly by user behaviors, such as clicking on a hyperlink, purchasing a book or reading a particular news article. CF requires no domain knowledge and offers the potential to uncover patterns that would be difficult or impossible to detect using content-based techniques. Besides that, collaborative filtering has proved its ability to identify the most appropriate item for each user, and the quality of recommendations is improved over time as long as the user database gets larger. Two different approaches have been explored for building Pure CF recommenders. The first approach, referred to as memory-based [56, 37, 15, 50, 27, 54], essentially makes rating predictions based on the entire collection of rated items. Items frequently selected by users of the same group can be used to form the basis to build a list or recommended items. They produce high-quality recommendations but suffer serious scalability problems as the number of users and items grow. The other approach, known as model-based [56, 14, 15, 9], analyzes historical interaction information to build a model of the relations between different items/users which is intended to find the recommended items. Model-based schemes produce faster recommendations than memory-based do, but requires a significant amount of time to build the models and leads to lower quality recommendations. Definitions and Notation In the context of recommender systems, a dataset is defined as the collection of all transactions about the items that have been selected by a collection of users
nomy of Collaborative- Based Recommender Systems Symbols n and m will be used in this text to denote the number of distinct users and items in a particular dataset, respectively. Each dataset will be represented formally by a n x m matrix that will be referred to as the user-item matrix, A=uxI.u denotes the set of all users and I the set of all items available in he database. The value of element ak i E 1,0 denotes whether an interaction between user k and item i has been observed or not In a recommendation problem, there usually exists additional information about the utility of the user-item interactions, commonly captured as a rating that indicates how a particular user liked a particular item. This rating infor mation is represented in a different n x m matrix that will be denoted R. The rating that user k expressed for item i is in general a real number and will be referred to as Tki. Tk denotes the vector of all ratings of user k. In recommender systems terminology, the active user is the user that queries the recommender system for recommendations on some items. The symbol a will be used to refer to the active user's rating vector. By convention, if di denotes a vector that results from taking row i from a certain matrix D, d; will be used to denote the vector that results from taking column j from that matrix The symbol Ak refers to the set of items the user has already experienced and Rk is the set of items for which user k has actually given ratings. Note that RkCI and Rk CA Problem formulation In its most common formulation, the CF recommendation problem is reduced to the problem of estimating, using collaborative features, the utility for the items that have not been selected by the active user. Once these utilities for unseen items are estimated, a top-N recommendation can be built for every user, by recommending the user the items with the highest estimated values This estimation is usually computed from the ratings explicitly given by the active user to a specific set of items(rating-based filtering) but ratings could also be derived from historical data(purchases, . or from other sources of information In the rest of the chapter we will assume without loss of generality that interactions are based on rating activity. In movie recommendation, fc instance, the input to the recommender engine would be a set of movies the user has seen, with some numerical rating associated with each of these movies. The output of the recommender system would be another set of movies, not yet rated by the user, that the recommender predicts to be highly rated by the user More formally, given the user-item rating matrix R and the set of ratings a specified by the active user, the recommender engine tries to identify an ordered set of items a such that xnRk=0. To achieve this, the recommendation engine defines a function (k,j)→v(k,j)=E(rk)
A Taxonomy of Collaborative-Based Recommender Systems 83 Symbols n and m will be used in this text to denote the number of distinct users and items in a particular dataset, respectively. Each dataset will be represented formally by a n × m matrix that will be referred to as the user-item matrix, A = U×I. U denotes the set of all users and I the set of all items available in the database. The value of element ak,i ∈ {1, 0} denotes whether an interaction between user k and item i has been observed or not. In a recommendation problem, there usually exists additional information about the utility of the user-item interactions, commonly captured as a rating that indicates how a particular user liked a particular item. This rating information is represented in a different n × m matrix that will be denoted R. The rating that user k expressed for item i is in general a real number and will be referred to as rk,i. rk denotes the vector of all ratings of user k. In recommender systems terminology, the active user is the user that queries the recommender system for recommendations on some items. The symbol a will be used to refer to the active user’s rating vector. By convention, if di denotes a vector that results from taking row i from a certain matrix D, dT j will be used to denote the vector that results from taking column j from that matrix. The symbol Ak refers to the set of items the user has already experienced and Rk is the set of items for which user k has actually given ratings. Note that Rk ⊆ I and Rk ⊆ A. Problem Formulation In its most common formulation, the CF recommendation problem is reduced to the problem of estimating, using collaborative features, the utility for the items that have not been selected by the active user. Once these utilities for unseen items are estimated, a top-N recommendation can be built for every user, by recommending the user the items with the highest estimated values. This estimation is usually computed from the ratings explicitly given by the active user to a specific set of items (rating-based filtering) but ratings could also be derived from historical data (purchases, ...) or from other sources of information. In the rest of the chapter we will assume without loss of generality that interactions are based on rating activity. In movie recommendation, for instance, the input to the recommender engine would be a set of movies the user has seen, with some numerical rating associated with each of these movies. The output of the recommender system would be another set of movies, not yet rated by the user, that the recommender predicts to be highly rated by the user. More formally, given the user-item rating matrix R and the set of ratings a specified by the active user, the recommender engine tries to identify an ordered set of items X such that X ∩Rk = ∅. To achieve this, the recommendation engine defines a function ν : U×I→ (k, j) → ν(k, j) = E(rk,j ) (1)
FP Lousame and e. sanchez Input Interface CF Algorithm Item for which user Fig. 1. Illustration of the recommendation process. Given the vector of ratings of the active user, the collaborative filtering algorithm produces selecting the N items with the highest estimated predictions. that predicts the utility of the interactions between each user k and every item j. Note that for a given user k, the utilities need to be computed only for items jET-Rk. Once all utilities are predicted, recommendations to the active user re made by selecting the items with the highest estimated utility(see figure 1) The prediction computation is usually performed on a sparse user-item matrix. Typical values of sparsity are in the order of 98%, what means an almost empty interaction matrix In addition to recommender systems that predict the absolute values of rat ings, there are other proposals focused on preference-based filtering, i.e., pre- icting the relative preferences of users [18, 35, 36. These techniques predict the correct relative order of the items rather than their individual ratings 2.1 Memory-Based Collaborative Filtering Memory-based collaborative filtering is motivated from the observation that users usually trust the recommendations from like-minded neighbors. These methods are aimed at computing unknown relations between users and items by means of nearest neighbor schemes that either identify pairs of items that tend to be rated similarity or users with a similar rating history. Memory-based collaborative filtering became very popular because they are easy-to-implement very intuitive, avoid the need of training and tuning many parameters, and the user can easily understand the rationale behind each recommendation Three components characterize this approach: (1)data preprocessing, in which input data to the recommender engine is preprocessed to remove global effects to normalize ratings, etc;(2)neighborhood selection, which consists in selecting the set of K users items that are most similar to the active user to the set of items already rated by the active user; and (3) prediction computation, which generates predictions and aggregates items in a top-N recommendation. Table 1 summarizes different memory-based algorithms that are briefly explained in next
84 F.P. Lousame and E. S´anchez Fig. 1. Illustration of the recommendation process. Given the vector of ratings of the active user, the collaborative filtering algorithm produces a recommendation by selecting the N items with the highest estimated predictions. that predicts the utility of the interactions between each user k and every item j. Note that for a given user k, the utilities need to be computed only for items j ∈I−Rk. Once all utilities are predicted, recommendations to the active user are made by selecting the items with the highest estimated utility (see figure 1). The prediction computation is usually performed on a sparse user-item matrix. Typical values of sparsity are in the order of 98%, what means an almost empty interaction matrix. In addition to recommender systems that predict the absolute values of ratings, there are other proposals focused on preference-based filtering, i.e., predicting the relative preferences of users [18, 35, 36]. These techniques predict the correct relative order of the items, rather than their individual ratings. 2.1 Memory-Based Collaborative Filtering Memory-based collaborative filtering is motivated from the observation that users usually trust the recommendations from like-minded neighbors. These methods are aimed at computing unknown relations between users and items by means of nearest neighbor schemes that either identify pairs of items that tend to be rated similarity or users with a similar rating history. Memory-based collaborative filtering became very popular because they are easy-to-implement, very intuitive, avoid the need of training and tuning many parameters, and the user can easily understand the rationale behind each recommendation. Three components characterize this approach: (1) data preprocessing, in which input data to the recommender engine is preprocessed to remove global effects, to normalize ratings, etc; (2) neighborhood selection, which consists in selecting the set of K users [items] that are most similar to the active user [to the set of items already rated by the active user]; and (3) prediction computation, which generates predictions and aggregates items in a top-N recommendation. Table 1 summarizes different memory-based algorithms that are briefly explained in next subsections
A Taxonomy of Collaborative-Based Recommender Syster Table 1. Summary of memory-based algorithms based on the different components of the recommendation process Neighborhood selection Prediction computation Pearson correlation User-based Ratings Rating aggregation (default voting) Most frequent item Mean squared difference Predictability Predictability co Linear rating transforma- Item-based Conditional probability. Rating aggregation (adjusted ratings)based similarity Item-to-item coo Cluster-based (cluster-based Pearson correlation Rating aggregation nothing trust of users Trust inferences Ratings Weighted average composi-.Rating aggregation Improved Ratings Weight optimization Rating aggregation This CF approach estimates unknown ratings based on recorded ratings of like- nded users. The predicted rating of the active user for item j is a weighted sum of ratings of other user (r1-) where Uk denotes the set of users in the database that satisfy wk, 1+0. This weights can reflect distance, correlation or similarity between each user and the active user. Fk and Fi represent the mean rating of the active user k and user L, espectively. Different weighting functions can be considered. Pearson correlation, cosine vector similarity, Spearman correlation, entropy-based uncertainty, mean-square difference are some examples. The Pearson correlation(eq. 3) was the first measure used to compute these weights 50. Breese et al. [15 and Herlocker et al.[27 proved that Pearson correlation performs better than other metrics (rk.i-7k)(r;-) I Note that Pearson correlation is defined in [1,+1] and then, in order to make sense sing negative weights, ratings should be re-scaled to fit [-r, +rl
A Taxonomy of Collaborative-Based Recommender Systems 85 Table 1. Summary of memory-based algorithms based on the different components of the recommendation process Data (preprocessing) Neighborhood selection Prediction computation User-based Ratings (default voting) · Pearson correlation · Vector similarity → Inverse user frequency · Mean squared difference · Rating aggregation · Most frequent item Predictability paths Ratings · Predictability condition heuristics · Linear rating transformation Item-based Ratings (adjusted ratings) · Vector similarity · Pearson correlation · Conditional probability based similarity · Rating aggregation · Regression based Item-to-item coocurrence Cluster-based smoothing Ratings (cluster-based smoothing) · Pearson correlation · Rating aggregation Trust inferences Ratings · Compute trust of users → Pearson correlation Weighted average composition · Rating aggregation Improved neighborhood Ratings (remove global effects) · Weight optimization · Rating aggregation User-Based This CF approach estimates unknown ratings based on recorded ratings of likeminded users. The predicted rating of the active user for item j is a weighted sum of ratings of other users, νk,j = ¯rk + l∈Uk wk,l · (rl,j − r¯l) l∈Uk |wk,l| (2) where Uk denotes the set of users in the database that satisfy wk,l = 0. This weights can reflect distance, correlation or similarity between each user and the active user. ¯rk and ¯rl represent the mean rating of the active user k and user l, respectively. Different weighting functions can be considered. Pearson correlation, cosine vector similarity, Spearman correlation, entropy-based uncertainty, mean-square difference are some examples. The Pearson correlation (eq. 3)1 was the first measure used to compute these weights [50]. Breese et al. [15] and Herlocker et al. [27] proved that Pearson correlation performs better than other metrics. wk,l = i∈Rk∩Rl (rk,i − r¯k)(rl,i − r¯l) i∈Rk∩Rl (rk,i − r¯k)2 i∈Rk∩Rl (rl,i − r¯l)2 (3) 1 Note that Pearson correlation is defined in [−1, +1] and then, in order to make sense when using negative weights, ratings should be re-scaled to fit [−r,+r].
FP Lousame and e. sanchez Vector similarity is another weighting function that can be used to measure the similarity between users ∈R∩R1 Tki. rli Though Pearson correlation and vector similarity are the most popular, other metrics are also used. For instance, Shardanand and Maes 56 used a Mean Squared Difference to compute the degree of dissimilarity between users k and I and predictions were made by considering all users with a dissimilarity to the user which was less than a certain threshold and computing the weighted average of the ratings provided by the most similar users, where weights were inverse proportional to this dissimilarity. They also presented a Constrained Pearson correlation to take into account the positivity and negativity of ratings Most frequent item recommendation. Instead of using equation 2 to compute predictions and then construct a top-N ndation by selecting the highest predicted items, each similar item could be ranked according to how many similar Sk, j Uk/al.s= ind the recommendation list would be then computed by sorting the most fre- quently selected N items Weighting Schemes Breese et al. [15 investigated different modifications to the weighting function that have shown to improve performance of this memory-based approach Default voting was proposed as an extension of the Pearson correlation(equation 3)that improves the similarity measure in cases in which either the active user or the matching user have relatively few ratings(Rk n R has very few items) Refer to [15 for a mathematical formulation Inverse user frequency tries to reduce weights for commonly selected items based on the background idea that commonly selected items are not as useful in char acterizing the user as those items that are selected less frequently. Following the original concepts in the domain of information retrieval [10 the user inverse frequency can be defined as I ukI {uk:i∈Bk where ni is the number of users who rated item i and n is the total number f users in the database. To use the inverse user frequency in equation 4 the transformed rating is simply the original rating multiplied by the user inverse frequency. It can also be used in correlation but the transformation is not direct (see Breese et al. [15 for a detailed description)
86 F.P. Lousame and E. S´anchez Vector similarity is another weighting function that can be used to measure the similarity between users: wk,l = i∈Rk∩Rl rki · rli i∈Rk∩Rl r2 ki i∈Rk∩Rl r2 li (4) Though Pearson correlation and vector similarity are the most popular, other metrics are also used. For instance, Shardanand and Maes [56] used a Mean Squared Difference to compute the degree of dissimilarity between users k and l and predictions were made by considering all users with a dissimilarity to the user which was less than a certain threshold and computing the weighted average of the ratings provided by the most similar users, where weights were inverse proportional to this dissimilarity. They also presented a Constrained Pearson correlation to take into account the positivity and negativity of ratings in absolute scales. Most frequent item recommendation. Instead of using equation 2 to compute predictions and then construct a top-N recommendation by selecting the highest predicted items, each similar item could be ranked according to how many similar users selected it sk,j = l∈Uk/al,j=1 1 (5) and the recommendation list would be then computed by sorting the most frequently selected N items. Weighting Schemes Breese et al. [15] investigated different modifications to the weighting function that have shown to improve performance of this memory-based approach: Default voting was proposed as an extension of the Pearson correlation (equation 3) that improves the similarity measure in cases in which either the active user or the matching user have relatively few ratings (Rk ∩ Rl has very few items). Refer to [15] for a mathematical formulation. Inverse user frequency tries to reduce weights for commonly selected items based on the background idea that commonly selected items are not as useful in characterizing the user as those items that are selected less frequently. Following the original concepts in the domain of information retrieval [10] the user inverse frequency can be defined as: fi = log | {uk} | | {uk : i ∈ Bk} | = log n ni (6) where ni is the number of users who rated item i and n is the total number of users in the database. To use the inverse user frequency in equation 4 the transformed rating is simply the original rating multiplied by the user inverse frequency. It can also be used in correlation but the transformation is not direct (see Breese et al. [15] for a detailed description).
A Taxonomy of Collab based recommender syster Predictability Paths Aggarwal et al. 9 proposed a graph-based recommendation algorithm in which the users are represented as nodes of a graph and the edges between the nodes indicate the degree of similarity between the users. The recommendations for a user were computed by traversing nearby nodes in this graph. The graph repre- entation has the ability to capture transitive relations which cannot be captured by nearest neighborhood algorithms. Authors reported better performance than the user -based schemes The approach is based on the concepts of horting and predictability. The horting condition states whether there is enough overlap between each pair of users(k, i )to decide whether the behavior of one user could predict the behavior of the other or not. By definition, user k horts user l if the following equation is card(Rk∩R)2min(F·card(Rk),G) where F< l and G is some predefined threshold. The predictability condition establishes that user I predicts behavior of user k if there exists a linear rating transformation k1,k,·k=8·7y+t carries ratings rL,] of user l into ratings Tk,j of user k with an acceptable The(s, t) pair of real numbers is chosen so that the transformation 8 keeps at least one value in the rating domain(see 9 for further details on s-t value pair restrictions). More formally, user I predicts user k if user k horts user I(eq 7)and if there exists a linear rating transformation Ts, t such that the expression 9 is satisfied, with B a positive real number. ∑/∈RnR1n-xk) Each arc between users k and l indicates that user l predicts user k and therefore it has associated a linear transformation TskltkI. Using an appropriate graph search algorithm a set of optimal directed paths between user k and any user that selected item j can be constructed. Each directed path allows a rating prediction computation based on the composition of transformations(eq. 8) For instance, given the directed graph k→l1→….→ In with predictor values (sk. 1, tk, 1), ($12, t1.2),.(sn-1,n, tn-1n)the predicted rating of item j will be SkA,tL o(Tsa,ty o(.oTsn-lntn-(rn.).). Since different paths may exist the average of these predicted ratings is computed as the final prediction. a top- N recommendation is constructed by aggregating the n items with the highest predicted ratings Item-Based The item-based algorithm is an analogous alternative to the user-based approach that was proposed by Sarwar et al. 53 to address the scalability problems of
A Taxonomy of Collaborative-Based Recommender Systems 87 Predictability Paths Aggarwal et al. [9] proposed a graph-based recommendation algorithm in which the users are represented as nodes of a graph and the edges between the nodes indicate the degree of similarity between the users. The recommendations for a user were computed by traversing nearby nodes in this graph. The graph representation has the ability to capture transitive relations which cannot be captured by nearest neighborhood algorithms. Authors reported better performance than the user-based schemes. The approach is based on the concepts of horting and predictability. The horting condition states whether there is enough overlap between each pair of users (k,l) to decide whether the behavior of one user could predict the behavior of the other or not. By definition, user k horts user l if the following equation is satisfied: card(Rk ∩ Rl) ≥ min(F · card(Rk), G) (7) where F ≤ 1 and G is some predefined threshold. The predictability condition establishes that user l predicts behavior of user k if there exists a linear rating transformation Tsk,l,tk,l : xk,j = s · rl,j + t (8) that carries ratings rl,j of user l into ratings xk,j of user k with an acceptable error. The (s, t) pair of real numbers is chosen so that the transformation 8 keeps at least one value in the rating domain (see [9] for further details on s-t value pair restrictions). More formally, user l predicts user k if user k horts user l (eq. 7) and if there exists a linear rating transformation Ts,t such that the expression 9 is satisfied, with β a positive real number. j∈Rk∩Rl |rk,j − xk,j )| card(Rk ∩ Rl) < β (9) Each arc between users k and l indicates that user l predicts user k and therefore it has associated a linear transformation Tsk,l,tk,l . Using an appropriate graph search algorithm a set of optimal directed paths between user k and any user l that selected item j can be constructed. Each directed path allows a rating prediction computation based on the composition of transformations (eq. 8). For instance, given the directed graph k → l1 → ... → ln with predictor values (sk,1, tk,1),(s1,2, t1,2), ...,(sn−1,n, tn−1,n) the predicted rating of item j will be Tsk,1,tk,1 ◦ (Ts1,2,t1,2 ◦ (... ◦Tsn−1,n,tn−1,n (rn,j )...)). Since different paths may exist, the average of these predicted ratings is computed as the final prediction. A topN recommendation is constructed by aggregating the N items with the highest predicted ratings. Item-Based The item-based algorithm is an analogous alternative to the user-based approach that was proposed by Sarwar et al. [53] to address the scalability problems of
F.P. Lousame and e. sanchez the user-based approach. The algorithm, in its original formulation, generates a list of recommendations for the active user by selecting new items that are similar to the collection of items already rated by the user. As for the user-based approach, the item-based approach consists of two different components: the similarity computation and the prediction computation. There are different ways to compute the similarity between items. Here we present four of these methods: vector similarity, Pearson correlation, adjusted vector similarity and conditional probability-based similarity. Vector similarity. One way to compute the similarity between items is to con- sider each item i as a vector in the m dimensional user space. The similarity between any two items i and j is measured by computing the cosine of the angle between these two vectors: k∈U,∩U,Tk,iTkj (10) k∈ VinU,k,iv2k∈U∩U,k,j where the summation is extended to users who rated both of the items. k e U∩U Pearson correlation. Similarly to equation 3, the Pearson correlation between items i and j is given by: ∑k∈Unu,(Tk,x-元)(Tk,;-) (11) eLnU,(rk-7)21/∑k∈Unu,(rk-示)2 where Fi and Fi denote the average rating of items i and 3, respectively Adjusted vector similarity. Computing similarity between items using the vector similarity has one important draw back: the difference in the rating scale between different users is not taken into account. This similarity measure addresses this problem by subtracting the corresponding user average from each rating Wii= (12) EkeunU, (rk i-k)2 Ekeu, nu, (Tkj-ik)2 Conditional probability-based similarity. An alternative way to compute the ilarity between each pair of items is to use a measure based on the condi probability of selecting one of the items given that the other item was selected This probability can be expressed as the number of users that selected both items i,j divided by the total number of users that selected item 1{uk:i,j∈Rk} un=P(1)={k:1∈R 13) Note that this similarity measure is not symmetric: PGi)+ P(il])
88 F.P. Lousame and E. S´anchez the user-based approach. The algorithm, in its original formulation, generates a list of recommendations for the active user by selecting new items that are similar to the collection of items already rated by the user. As for the user-based approach, the item-based approach consists of two different components: the similarity computation and the prediction computation. There are different ways to compute the similarity between items. Here we present four of these methods: vector similarity, Pearson correlation, adjusted vector similarity and conditional probability-based similarity. Vector similarity. One way to compute the similarity between items is to consider each item i as a vector in the m dimensional user space. The similarity between any two items i and j is measured by computing the cosine of the angle between these two vectors: wi,j = k∈Ui∩Uj rk,irk,j k∈Ui∩Uj r2 k,i k∈Ui∩Uj r2 k,j (10) where the summation is extended to users who rated both of the items, k ∈ Ui ∩ Uj . Pearson correlation. Similarly to equation 3, the Pearson correlation between items i and j is given by: wi,j = k∈Ui∩Uj (rk,i − r¯i)(rk,j − r¯j ) k∈Ui∩Uj (rk,i − r¯i)2 k∈Ui∩Uj (rk,j − r¯j )2 (11) where ¯ri and ¯rj denote the average rating of items i and j, respectively. Adjusted vector similarity. Computing similarity between items using the vector similarity has one important drawback: the difference in the rating scale between different users is not taken into account. This similarity measure addresses this problem by subtracting the corresponding user average from each rating: wi,j = k∈Ui∩Uj (rk,i − r¯k)(rk,j − r¯k) k∈Ui∩Uj (rk,i − r¯k)2 k∈Ui∩Uj (rk,j − r¯k)2 (12) Conditional probability-based similarity. An alternative way to compute the similarity between each pair of items is to use a measure based on the conditional probability of selecting one of the items given that the other item was selected. This probability can be expressed as the number of users that selected both items i, j divided by the total number of users that selected item i: wi,j = P(j|i) = | {uk : i, j ∈ Rk} | | {uk : i ∈ Rk} | (13) Note that this similarity measure is not symmetric: P(j|i) = P(i|j).
A Taxonomy of Collaborative-Based Recommender To compute predictions using the item-based approach, a recommendation list generated by ranking items with a prediction measure computed by taking a weighted average over all active users ratings for items in the collection Rk ∈Rk7k,·U Dk,j Model Based Interpretation Since similarities among items do not change frequently, relations between items can be stored in a model M. This is why some researchers consider the item-based is a model-based approach to collaborative filtering Model M could contain all relations between pairs of items but one common approach is to store, for each item i, its top-K similar items only. This para of m on k is moti- vated due to performance considerations. By using a small value of K, M would be very sparse and then similarity information could be stored in memory even in situations in which the number of items in the dataset is very large. However if K is very small, the resulting model will contain limited information and could potentially lead to low quality recommendations(see 53 for further reading Item-to-Item Extension Greg Linden et al. [41 proposed this extension to the item-based approach that is pable of producing recommendations in real time, to scale to massive datasets and to generate high-quality recommendations. The algorithm is essentially an tem-based approach but includes several advantages to make the item-to-item algorithm faster than the item-based: (1)the similarity computation is extended only to item pairs with common users(co-ocurrent items)and(2)the recom- nendation list is computed by looking into a small set that aggregates items that were found similar to a certain basket of user selections To determine the most similar match from a given item, the algorithm builds a co-ocurrence matrix by finding items that users tend to select together. The similarity between two items i and j is not zero if at least q+l users have selected the pair (i,j), with q 20 some predefined threshold. The similarity between two items satisfying this property can be computed in various ways but a common method is to use the cosine similarity described in equation 10. Predictions fo new items are computed with equation 14(see [41 for further details) Cluster-Based Smoothing Xue et al. 59 proposed a collaborative filtering algorithm that provides higher accuracy as well as increased efficiency in recommendations. The algorithm is a user-based algorithm that has been enhanced with clustering and a rating smoothing mechanism based on clustering results. Clustering was performed by using the K-means algorithm with the Pearson correlation coefficient as the distance metric(eq 3)between users Data smoothing is a mechanism to fill in the missing values of the rat ing matrix. To do data smoothing Xue et al. 59 made explicit use of the
A Taxonomy of Collaborative-Based Recommender Systems 89 To compute predictions using the item-based approach, a recommendation list is generated by ranking items with a prediction measure computed by taking a weighted average over all active user’s ratings for items in the collection Rk: νk,j = i∈Rk rk,i · wi,j i∈Rk |wi,j | (14) Model Based Interpretation Since similarities among items do not change frequently, relations between items can be stored in a model M. This is why some researchers consider the item-based is a model-based approach to collaborative filtering. Model M could contain all relations between pairs of items but one common approach is to store, for each item i, its top-K similar items only. This parameterization of M on K is motivated due to performance considerations. By using a small value of K, M would be very sparse and then similarity information could be stored in memory even in situations in which the number of items in the dataset is very large. However, if K is very small, the resulting model will contain limited information and could potentially lead to low quality recommendations (see [53] for further reading). Item-to-Item Extension Greg Linden et al. [41] proposed this extension to the item-based approach that is capable of producing recommendations in real time, to scale to massive datasets and to generate high-quality recommendations. The algorithm is essentially an item-based approach but includes several advantages to make the item-to-item algorithm faster than the item-based: (1) the similarity computation is extended only to item pairs with common users (co-ocurrent items) and (2) the recommendation list is computed by looking into a small set that aggregates items that were found similar to a certain basket of user selections. To determine the most similar match from a given item, the algorithm builds a co-ocurrence matrix by finding items that users tend to select together. The similarity between two items i and j is not zero if at least q+1 users have selected the pair (i, j), with q ≥ 0 some predefined threshold. The similarity between two items satisfying this property can be computed in various ways but a common method is to use the cosine similarity described in equation 10. Predictions for new items are computed with equation 14 (see [41] for further details). Cluster-Based Smoothing Xue et al. [59] proposed a collaborative filtering algorithm that provides higher accuracy as well as increased efficiency in recommendations. The algorithm is a user-based algorithm that has been enhanced with clustering and a rating smoothing mechanism based on clustering results. Clustering was performed by using the K-means algorithm with the Pearson correlation coefficient as the distance metric (eq. 3) between users. Data smoothing is a mechanism to fill in the missing values of the rating matrix. To do data smoothing Xue et al. [59] made explicit use of the
FP Lousame and e. sanchez lusters as smoothing mechanisms. Based on the clustering results they applied the following smoothing strategy ifTk≠0 (15) where Fk.i denotes the smoothed value for user A's rating towards an item By considering the diversity of the user. Xue et al. 59 proposed the fol equation to compute the smoothed rating k=下k+4rc(k),=k+ (16) where C(k)denotes the cluster of user k and C(k, j)the subset of users in cluster C(k) that rated item j. Smoothed ratings are used to compute a pre-selection larity between each user l in the neighborhood and the active user is computed ng the smoothed representation of the user rating hl (7)、∑/∈R吃( ere 1-Aifn≠② otherwise represents the confidential weight for the user I on item j. A E0, 1]is a parameter for tuning the weight between the user rating and the cluster rating. Predictions for the active user are computed by aggregating ratings from the top-K most similar users in the same manner as for the user-based algorithm ∑∈n,j·mk.l·(m-) ∑1∈b;·lukl By assigning different values to A Xue et al., 59 adjusted the weighting schema For instance, if a=0 the algorithm only uses the original rating information for the similarity computation and prediction. But if A= l the algorithm is a cluster-based Cf that uses the average ratings of clustering for similarity and prediction computation Trust Inferences This approach focuses on developing a computational model that permits to ex- plore transitive user similarities based on trust inferences. Papagelis et al.[46 presented the concept of associations between users as an expression of established
90 F.P. Lousame and E. S´anchez clusters as smoothing mechanisms. Based on the clustering results they applied the following smoothing strategy r k,j = rk,j if rk,j = ∅ r¯k,j otherwise (15) where ¯rk,j denotes the smoothed value for user k’s rating towards an item j. By considering the diversity of the user, Xue et al. [59] proposed the following equation to compute the smoothed rating: r¯k,j = ¯rk + ΔrC(k),j = ¯rk + 1 |C(k, j)| l∈C(k,j) (rl,j − r¯l) (16) where C(k) denotes the cluster of user k and C(k, j) the subset of users in cluster C(k) that rated item j. Smoothed ratings are used to compute a pre-selection of neighbors. Basically, given the active user k, a set of most similar clusters is selected to build a neighborhood of similar users. After this preselection, the similarity between each user l in the neighborhood and the active user is computed using the smoothed representation of the user ratings, wk,l = j∈Rk δl,j · (r k,j − r¯k)(r l,j − r¯l) j∈Rk (r k,j − r¯k)2 j∈Rk δ2 l,j (r l,j − r¯l)2 (17) where δl,j = 1 − λ if rl,j = λ otherwise (18) represents the confidential weight for the user l on item j. λ ∈ [0, 1] is a parameter for tuning the weight between the user rating and the cluster rating. Predictions for the active user are computed by aggregating ratings from the top-K most similar users in the same manner as for the user-based algorithm (see equation 2): νkj = ¯rk + l∈Uk δl,j · wk,l · (rlj − r¯l) l∈Uk δl,j · |wkl| (19) By assigning different values to λ Xue et al., [59] adjusted the weighting schema. For instance, if λ = 0 the algorithm only uses the original rating information for the similarity computation and prediction. But if λ = 1 the algorithm is a cluster-based CF that uses the average ratings of clustering for similarity and prediction computation. Trust Inferences This approach focuses on developing a computational model that permits to explore transitive user similarities based on trust inferences. Papagelis et al. [46], presented the concept of associations between users as an expression of established