正在加载图片...
M. Y.H. Al-Shamri, KK Bharadwaj Expert Systems with Applications 35(2008)1386-135 demographic preferred. Lifestyle Finder(Krulwich, hybridization methods are described by Adomavicius and 1997)uses demographic groups from marketing research Tuzhilin(2005) and Burke(2002) to suggest a range of products and services Basically to build a rs using any filtering technique, a Content-based filtering(CBF: The user will be recom- model for each user is required which must reflect the user mended items similar to the ones he preferred in the preferences and taste. The key success of any filtering tech- past. Example of such systems is News Weeder(Lang, nique comes from the way it learns user models. The CBF 1995) learns the model from the features describing the items the Collaborative filtering (CF): The user will be recom- user has rated in the past, whereas CF learns the model mended items people with similar tastes and preferences from ratings of the items themselves( Breese et al., 1998; liked in the past. GroupLens(Resnick et al., 1994), Resnick et al., 1994; Vozalis Margaritis, 2003 ) In real MovieLens(Miller et al., 2003), and Ringo( Shardanand life users put some priorities for each feature and these pri- Maes, 1995) are some examples of such systems orities need learning also Ujin and Bentley (2004)used the Hybrid filtering: These techniques combine more than evolutionary search to learn these priorities tailoring the one filtering technique to enhance the performance like recommendation process to the preferences of each individ- Fab(balabanovic Shoham, 1997)and Amazon. com ual user. (Linden et al., 2003). Formally, in CF recommenders, we have a set of users U=u1, u2,.. um rating a set of items S=s1, 52, .. Sn) The main shortcomings of CBF are content limitations, such as books, movies, or CDs. The spaces S and U are only a very shallow analysis of certain kinds of content large and can be very large in some applications. Each use can be supplied, and over-specialization; the user is W; i= 1, 2,., m has rated a subset of items S; Specifically, restricted to see items similar to those already rated by the rating of user ue for item s; j=1, 2,..., n is denoted by him only(Balabanovic Shoham, 1997). These systems rci. All the available ratings are collected in a mxn us are not suitable for dynamic and very large environments, item matrix denoted by R. Four phases are required to where items are millions and inserted in the system fre- form the recommendation task in CF recommenders quently(Adomavicius Tuzhilin, 2005) The great power of CF relative to CBF is its cross-genre (a) Data collection or outside the box'recommendation ability Moreover CF (b) User model formation is completely independent of any machine-readable repre (c) Neighborhood set selection sentation of the items being recommended(Adomavicius (d) Making recommendations tuzhilin, 2005; Burke, 2002). However, CF suffers some weaknesses: new user problem(cold start problem), spar sity, scalability, and loss of neighbor transitivity(Breese 2. 1.1. Data collection et al., 1998; Vozalis Margaritis, 2003 ). Usually each user Generally, three types of data can be collected from rates only a very limited percentage of items, when com- users, demographical data through the registration process, pared to the available total. This leads to sparse user-item explicit ratings for a subset of the available items, and matrices, the set of users cross the set of items, therefore implicit data from the user's online behavior. In order to weak recommendations could be produced because the suc- conduct our experiments we used the original MovieLens cessful neighbors cannot be found dataset(http://www.movielens.umn.edu).Thedatasetcon a scalability problem. Further, relying directly on individ- very good, and 5-excellent numerical scale. Each user has ual items ratings result in a loss of neighbor transitivity. rated at least 20 movies. Simple demographical data such Assume that we have three users ui, uj, and uk. Users ui as age, gender, occupation and zip code are included for and u; correlate highly, also u and uk correlate highly. all users, which are collected when a new user registers Because of transitivity there is a possibility that users u; on the system. The movie title, release date, video release and uk correlate highly too. Such a transitive relationship date, and genre data are given for each movie. The genre is not captured in pure CF, unless users u; and uk have feature specifies if the movie is an action, adventure, ani- rated many common items(Vozalis margaritis, 2003). mation, childrens, comedy, crime, documentary, drama, To remedy the aforementioned weaknesses of CBF and fantasy, film-noir, horror, musical, mystery, romance CF, trust-aware RS(Massa Avesani, 2004 )-recommen- sci-fi, thriller, war, or western. a single movie can belong dations are based only on ratings given by users trusted to more than one genre directly or indirectly by the active user-and hybrid filter ing recommenders( Balabanovic Shoham, 1997: Linden 2.1.2. User model formation et al., 2003; Pazzani, 1999: Shahabi et al., 2001) are pro- The difference between a user profile and a user model posed. A recent comprehensive survey of the state of the lies in the different levels of sophistication. The user profile art in Rs, various limitations of the current generation, is simply a collection of personal information about the the various ways to extend their capabilities, and various user, which can be described as a simple user modeldemographic preferred. Lifestyle Finder (Krulwich, 1997) uses demographic groups from marketing research to suggest a range of products and services. • Content-based filtering (CBF): The user will be recom￾mended items similar to the ones he preferred in the past. Example of such systems is NewsWeeder (Lang, 1995). • Collaborative filtering (CF): The user will be recom￾mended items people with similar tastes and preferences liked in the past. GroupLens (Resnick et al., 1994), MovieLens (Miller et al., 2003), and Ringo (Shardanand & Maes, 1995) are some examples of such systems. • Hybrid filtering: These techniques combine more than one filtering technique to enhance the performance like Fab (Balabanovic & Shoham, 1997) and Amazon.com (Linden et al., 2003). The main shortcomings of CBF are content limitations; only a very shallow analysis of certain kinds of content can be supplied, and over-specialization; the user is restricted to see items similar to those already rated by him only (Balabanovic & Shoham, 1997). These systems are not suitable for dynamic and very large environments, where items are millions and inserted in the system fre￾quently (Adomavicius & Tuzhilin, 2005). The great power of CF relative to CBF is its cross-genre or ‘outside the box’ recommendation ability. Moreover CF is completely independent of any machine-readable repre￾sentation of the items being recommended (Adomavicius & Tuzhilin, 2005; Burke, 2002). However, CF suffers some weaknesses: new user problem (cold start problem), spar￾sity, scalability, and loss of neighbor transitivity (Breese et al., 1998; Vozalis & Margaritis, 2003). Usually each user rates only a very limited percentage of items, when com￾pared to the available total. This leads to sparse user-item matrices, the set of users cross the set of items, therefore weak recommendations could be produced because the suc￾cessful neighbors cannot be found. On the other hand, the computational cost of RS grows fast with both the number of users and items giving rise to a scalability problem. Further, relying directly on individ￾ual items ratings result in a loss of neighbor transitivity. Assume that we have three users ui, uj, and uk. Users ui and uj correlate highly, also uj and uk correlate highly. Because of transitivity there is a possibility that users ui and uk correlate highly too. Such a transitive relationship is not captured in pure CF, unless users ui and uk have rated many common items (Vozalis & Margaritis, 2003). To remedy the aforementioned weaknesses of CBF and CF, trust-aware RS (Massa & Avesani, 2004) – recommen￾dations are based only on ratings given by users trusted directly or indirectly by the active user – and hybrid filter￾ing recommenders (Balabanovic & Shoham, 1997; Linden et al., 2003; Pazzani, 1999; Shahabi et al., 2001) are pro￾posed. A recent comprehensive survey of the state of the art in RS, various limitations of the current generation, the various ways to extend their capabilities, and various hybridization methods are described by Adomavicius and Tuzhilin (2005) and Burke (2002). Basically to build a RS using any filtering technique, a model for each user is required which must reflect the user preferences and taste. The key success of any filtering tech￾nique comes from the way it learns user models. The CBF learns the model from the features describing the items the user has rated in the past, whereas CF learns the model from ratings of the items themselves (Breese et al., 1998; Resnick et al., 1994; Vozalis & Margaritis, 2003). In real life, users put some priorities for each feature and these pri￾orities need learning also. Ujjin and Bentley (2004) used the evolutionary search to learn these priorities tailoring the recommendation process to the preferences of each individ￾ual user. Formally, in CF recommenders, we have a set of users U = {u1,u2,...,um} rating a set of items S = {s1,s2,...,sn}, such as books, movies, or CDs. The spaces S and U are large and can be very large in some applications. Each user ui, i = 1, 2,...,m has rated a subset of items Si. Specifically, the rating of user uc for item sj, j = 1, 2,...,n is denoted by rc,j. All the available ratings are collected in a m · n user￾item matrix denoted by R. Four phases are required to per￾form the recommendation task in CF recommenders: (a) Data collection (b) User model formation (c) Neighborhood set selection (d) Making recommendations 2.1.1. Data collection Generally, three types of data can be collected from users, demographical data through the registration process, explicit ratings for a subset of the available items, and implicit data from the user’s online behavior. In order to conduct our experiments we used the original MovieLens dataset (http://www.movielens.umn.edu). The dataset con￾sists of 100,000 ratings, assigned by 943 users on 1682 mov￾ies. All ratings follow the 1 – bad, 2 – average, 3 – good, 4 – very good, and 5 – excellent numerical scale. Each user has rated at least 20 movies. Simple demographical data such as age, gender, occupation and zip code are included for all users, which are collected when a new user registers on the system. The movie title, release date, video release date, and genre data are given for each movie. The genre feature specifies if the movie is an action, adventure, ani￾mation, children’s, comedy, crime, documentary, drama, fantasy, film-noir, horror, musical, mystery, romance, sci-fi, thriller, war, or western. A single movie can belong to more than one genre. 2.1.2. User model formation The difference between a user profile and a user model lies in the different levels of sophistication. The user profile is simply a collection of personal information about the user, which can be described as a simple user model. 1388 M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有