Availableonlineatwww.sciencedirectcom ° Science Direct Expert Sy stems with Applications ELSEVIER Expert Systems with Applications 35(2008)1386-1399 www.elsevier.com/locate/eswa Fuzzy-genetic approach to recommender systems based on a novel hybrid user model Mohammad Yahya H. Al-Shamri, Kamal K. Bharadwaj School of Computer and Systems Sciences, Jawaharlal Nehru Unicersity, New Delhi 110 067, India Abstract The main strengths of collaborative filtering(CF), the most successful and widely used filtering technique for recommender systems e its cross-genre or outside the box'recommendation ability and that it is completely independent of any machine-readable represen tation of the items being recommended. However, CF suffers from sparsity, scalability, and loss of neighbor transitivity. CF techniques are either memory-based or model-based. While the former is more accurate, its scalability compared to model-based is poor. An impor tant contribution of this paper is a hybrid fuzzy-genetic approach to recommender systems that retains the accuracy of memory-based CF and the scalability of model-based CF. Using hybrid features, a novel user model is built that helped in achieving significant reduc- tion in system complexity, sparsity, and made the neighbor transitivity relationship hold. The user model is employed to find a set of like linded users within which a memory-based search is carried out. This set is much smaller than the entire set, thus improving systems scalability. Besides our proposed approaches are scalable and compact in size, computational results reveal that they outperform the classical approach. e 2007 Elsevier Ltd. All rights reserved Keywords: Recommender systems; Collaborative filtering: Web personalization; User model; Fuzzy sets 1. Introduction down the information overload, and efficiently guide the user in a personalized manner to interesting items within The popular use of web as a global information system a very large space of possible options(Burke, 2002). Typi- has flooded us with a tremendous amount of data and cally rs recommend information (URLS, netnews articles ), information. The end users are overloaded with a huge entertainment(books, movies, restaurants), or individuals amount of information from myriad resources. This explo- (experts). Amazon. com (Linden, Smith, York, 2003) sive growth of data has generated an urgent need for pow- and Movie Lens. org(Miller, Albert, Lam, Konstan, erful automated web personalization tools that can Riedl, 2003)are two well-known examples of Rs on the intelligently assist us in transforming the vast amount of web data into useful information and knowledge. In other Recommender systems employ four information filter words, these tools ensure that the right information is ing techniques, demographic filtering(DMF)(Krulwich, delivered to the right people at the right time( Adomavicius 1997), content-based filtering(CBF)(Lang, 1995), collabo- Tuzhilin, 2005; Eirinaki vazirgiannis, 2003). Web rec- rative filtering(Resnick, Iakovou, Sushak, Bergstrom, ommender systems(RS), the most successful example of Riedl, 1994; Shardanand Maes, 1995), and hybrid filter- Web personalization tools, tailor information access, trim ing techniques(Balabanovic Shoham, 1997; Pazzani, 1999: Shahabi, Banaei-Kashani, Chen, McLeod, 2001) DMF categorizes the user based on the user personal attr Corresponding author. Mobile: \9\W9810196636: fax: +91(11) butes and makes recommendations based on demographic 26717528 mohamad. alshamriagmaiLcom(M Y.H. Al-Shamri), classes while the CBF suggests items similar to the ones the (KK. Bharadwan user preferred in the past 0957-4174/S- see front matter 2007 Elsevier Ltd. All rights reserved doi:10.1016/eswa2007.08.016
Fuzzy-genetic approach to recommender systems based on a novel hybrid user model Mohammad Yahya H. Al-Shamri , Kamal K. Bharadwaj School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi 110 067, India Abstract The main strengths of collaborative filtering (CF), the most successful and widely used filtering technique for recommender systems, are its cross-genre or ‘outside the box’ recommendation ability and that it is completely independent of any machine-readable representation of the items being recommended. However, CF suffers from sparsity, scalability, and loss of neighbor transitivity. CF techniques are either memory-based or model-based. While the former is more accurate, its scalability compared to model-based is poor. An important contribution of this paper is a hybrid fuzzy-genetic approach to recommender systems that retains the accuracy of memory-based CF and the scalability of model-based CF. Using hybrid features, a novel user model is built that helped in achieving significant reduction in system complexity, sparsity, and made the neighbor transitivity relationship hold. The user model is employed to find a set of likeminded users within which a memory-based search is carried out. This set is much smaller than the entire set, thus improving system’s scalability. Besides our proposed approaches are scalable and compact in size, computational results reveal that they outperform the classical approach. 2007 Elsevier Ltd. All rights reserved. Keywords: Recommender systems; Collaborative filtering; Web personalization; User model; Fuzzy sets 1. Introduction The popular use of web as a global information system has flooded us with a tremendous amount of data and information. The end users are overloaded with a huge amount of information from myriad resources. This explosive growth of data has generated an urgent need for powerful automated web personalization tools that can intelligently assist us in transforming the vast amount of data into useful information and knowledge. In other words, these tools ensure that the right information is delivered to the right people at the right time (Adomavicius & Tuzhilin, 2005; Eirinaki & Vazirgiannis, 2003). Web recommender systems (RS), the most successful example of Web personalization tools, tailor information access, trim down the information overload, and efficiently guide the user in a personalized manner to interesting items within a very large space of possible options (Burke, 2002). Typically RS recommend information (URLs, netnews articles), entertainment (books, movies, restaurants), or individuals (experts). Amazon.com (Linden, Smith, & York, 2003) and MovieLens.org (Miller, Albert, Lam, Konstan, & Riedl, 2003) are two well-known examples of RS on the web. Recommender systems employ four information filtering techniques, demographic filtering (DMF) (Krulwich, 1997), content-based filtering (CBF) (Lang, 1995), collaborative filtering (Resnick, Iakovou, Sushak, Bergstrom, & Riedl, 1994; Shardanand & Maes, 1995), and hybrid filtering techniques (Balabanovic & Shoham, 1997; Pazzani, 1999; Shahabi, Banaei-Kashani, Chen, & McLeod, 2001). DMF categorizes the user based on the user personal attributes and makes recommendations based on demographic classes while the CBF suggests items similar to the ones the user preferred in the past. 0957-4174/$ - see front matter 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.08.016 * Corresponding author. Mobile: +91 (11) 9810196636; fax: +91 (11) 26717528. E-mail addresses: mohamad.alshamri@gmail.com (M.Y.H. Al-Shamri), kbharadwaj@gmail.com (K.K. Bharadwaj). www.elsevier.com/locate/eswa Available online at www.sciencedirect.com Expert Systems with Applications 35 (2008) 1386–1399 Expert Systems with Applications
M.Y.H. Al-Shamri, K.K. Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 The most familiar and most widely implemented filter- as fuzzy. But at the item level it is difficult to fuzzify the ing is the collaborative filtering. The initial CF was intro- profile because it would require prohibitively large space duced by Tapestry(Goldberg, Nichols, Oki, Terry, and long processing time 1992), and was automated by GroupLens(Resnick et al Our work in this paper is an attempt towards introduc- 1994)and Ringo( Shardanand Maes, 1995). Typically, ing hybridization at four different levels, namely feature- CF explores similar users(neighbors), recognizes common- level, model-level, CF algorithm-level, and approach-level alities between the user and his neighbors on the basis of Firstly, we proposed hybrid features that exploit both user their ratings, and then accordingly generates new recom- ratings for high rated items and some content description mendations based on inter-user comparisons(Adomavicius of the items. At the model-level we built a user model from Tuzhilin, 2005; Eirinaki Vazirgiannis, 2003). Further, the set of hybrid features and DMf profile. Hybridization CF and dMf have the unique capacity to identify cross- between model-based and memory-based algorithms of CF genre niches and can entice users to jump outside of the is done at the CF algorithm-level. The user model is used to familiar outside the box'( Burke, 2002) find a set of like-minded users within which a memory- Breese, Heckerman, and Kadie (1998)classified CF based search is carried out. This set is much smaller in size algorithms as either memory-based(Resnick et al., 1994; than the original set, thus making the technique scalable Shardanand Maes, 1995), the entire data is used for rec- Finally, we developed a hybrid fuzzy-genetic Rs by ommendations, or model-based (Shahabi et al., 2001)in employing Ga to evolve appropriate weights for each fea- which a model is derived offline from the data to be used ture of the user model and proposing a novel fuzzy distance for online recommendations. While the former is more metric to match users accurate, its scalability compared to model-based is poor. The contributions of this paper are three-fold: Practically, CF faces two fundamental challenges, namely accuracy and scalability. Although memory-based algo- A novel user model is built that enables hybrid filtering, rithms are simple, provide high accuracy recommend reduces the system complexity and the computational tions, and admit easy addition of new data, but they are time by roughly a factor of six. computationally expensive as the size of the input dataset A novel fuzzy distance function is proposed for users increases. Eventually, the user will leave the Web site before the processing completes. On the other hand, apply- .A hybrid fuzzy-genetic approach to recommender sys ing the model-based algorithm alone on such sparse data tems is developed though reduces the online processing cost, often comes at the cost of recommendation accuracy. One common threat The rest of this paper is organized as follows: some in current RS research is the need to combine recommenda- background on the rs is given in the next section. In Sec tion techniques to achieve peak performance because each tion 3, the proposed user model is presented, while the technique has its own pros and cons fuzzy and hybrid fuzzy-genetic approaches are given in Sec Essentially, Rs keep a profile for each user Without any tion 4. The experimental results of the proposed information about the user, the rs are not able to assist approaches and classical approach are discussed in Section the user. The user profile contains raw information about 5. Finally, in the last section we conclude our work with a the user. The terms user profile and user model are often review of our contributions along with some future used as synonyms. However, recent researchers(Froschl, research directions 2005: Koch, 2000) differentiate between them according to the level of sophistication. The user profile is a collection 2. Background of raw personal information represents preferences, back ground, personal details, and interactions with the system. 2. 1. Recommender systems Depending on the user profile, a user Thus, the user profile is used to retrieve the needed infor Recommender systems have gained an increasing mation to build up a model for the user. Koch(2000) importance since the early work on CF in the mid-1990s describes the user model as the representation of the sys- when researchers started focusing on RS that explicitly tems beliefs about the user and the user profile as a simple rely on the ratings structure(Adomavicius Tuzhilin, 2005). Normally, explicit ratings from users are binary rat Some efforts have been made towards introducing fuzz- ings (like/dislike)or follow a specified numerical scale less in RS. Nasraoui and Petenes(2003)used fuzzy indicating the degree of preference (e.g, 1-bad to pproximate reasoning to develop a general framework MAX-excellent, where MAX is the maximum possible for the recommendation process while Suryavanshi, Shiri, rating for a given system). Usually, one of the following and Mudur(2005)used relational fuzzy subtractive cluster- information filtering techniques is employed to generate ing Shahabi et al.(2001)proposed a Yoda Rs that softly recommendations: classifying active user based on typical patterns of users and then generating soft recommendations for him. The Demographic filtering (DMF): The user will be recom- user profile includes many features that can be described mended items similar to the ones other people with same
The most familiar and most widely implemented filtering is the collaborative filtering. The initial CF was introduced by Tapestry (Goldberg, Nichols, Oki, & Terry, 1992), and was automated by GroupLens (Resnick et al., 1994) and Ringo (Shardanand & Maes, 1995). Typically, CF explores similar users (neighbors), recognizes commonalities between the user and his neighbors on the basis of their ratings, and then accordingly generates new recommendations based on inter-user comparisons (Adomavicius & Tuzhilin, 2005; Eirinaki & Vazirgiannis, 2003). Further, CF and DMF have the unique capacity to identify crossgenre niches and can entice users to jump outside of the familiar ‘outside the box’ (Burke, 2002). Breese, Heckerman, and Kadie (1998) classified CF algorithms as either memory-based (Resnick et al., 1994; Shardanand & Maes, 1995), the entire data is used for recommendations, or model-based (Shahabi et al., 2001) in which a model is derived offline from the data to be used for online recommendations. While the former is more accurate, its scalability compared to model-based is poor. Practically, CF faces two fundamental challenges, namely accuracy and scalability. Although memory-based algorithms are simple, provide high accuracy recommendations, and admit easy addition of new data, but they are computationally expensive as the size of the input dataset increases. Eventually, the user will leave the Web site before the processing completes. On the other hand, applying the model-based algorithm alone on such sparse data, though reduces the online processing cost, often comes at the cost of recommendation accuracy. One common threat in current RS research is the need to combine recommendation techniques to achieve peak performance because each technique has its own pros and cons. Essentially, RS keep a profile for each user. Without any information about the user, the RS are not able to assist the user. The user profile contains raw information about the user. The terms user profile and user model are often used as synonyms. However, recent researchers (Froschl, 2005; Koch, 2000) differentiate between them according to the level of sophistication. The user profile is a collection of raw personal information represents preferences, background, personal details, and interactions with the system. Depending on the user profile, a user can be modeled. Thus, the user profile is used to retrieve the needed information to build up a model for the user. Koch (2000) describes the user model as the representation of the system’s beliefs about the user and the user profile as a simple user model. Some efforts have been made towards introducing fuzziness in RS. Nasraoui and Petenes (2003) used fuzzy approximate reasoning to develop a general framework for the recommendation process while Suryavanshi, Shiri, and Mudur (2005) used relational fuzzy subtractive clustering. Shahabi et al. (2001) proposed a Yoda RS that softly classifying active user based on typical patterns of users and then generating soft recommendations for him. The user profile includes many features that can be described as fuzzy. But at the item level it is difficult to fuzzify the profile because it would require prohibitively large space and long processing time. Our work in this paper is an attempt towards introducing hybridization at four different levels, namely featurelevel, model-level, CF algorithm-level, and approach-level. Firstly, we proposed hybrid features that exploit both user ratings for high rated items and some content descriptions of the items. At the model-level we built a user model from the set of hybrid features and DMF profile. Hybridization between model-based and memory-based algorithms of CF is done at the CF algorithm-level. The user model is used to find a set of like-minded users within which a memorybased search is carried out. This set is much smaller in size than the original set, thus making the technique scalable. Finally, we developed a hybrid fuzzy-genetic RS by employing GA to evolve appropriate weights for each feature of the user model and proposing a novel fuzzy distance metric to match users. The contributions of this paper are three-fold: • A novel user model is built that enables hybrid filtering, reduces the system complexity and the computational time by roughly a factor of six. • A novel fuzzy distance function is proposed for users matching. • A hybrid fuzzy-genetic approach to recommender systems is developed. The rest of this paper is organized as follows: some background on the RS is given in the next section. In Section 3, the proposed user model is presented, while the fuzzy and hybrid fuzzy-genetic approaches are given in Section 4. The experimental results of the proposed approaches and classical approach are discussed in Section 5. Finally, in the last section we conclude our work with a review of our contributions along with some future research directions. 2. Background 2.1. Recommender systems Recommender systems have gained an increasing importance since the early work on CF in the mid-1990s when researchers started focusing on RS that explicitly rely on the ratings structure (Adomavicius & Tuzhilin, 2005). Normally, explicit ratings from users are binary ratings (like/dislike) or follow a specified numerical scale indicating the degree of preference (e.g., 1 – bad to MAX – excellent, where MAX is the maximum possible rating for a given system). Usually, one of the following information filtering techniques is employed to generate recommendations: • Demographic filtering (DMF): The user will be recommended items similar to the ones other people with same M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399 1387
M. Y.H. Al-Shamri, KK Bharadwaj Expert Systems with Applications 35(2008)1386-135 demographic preferred. Lifestyle Finder(Krulwich, hybridization methods are described by Adomavicius and 1997)uses demographic groups from marketing research Tuzhilin(2005) and Burke(2002) to suggest a range of products and services Basically to build a rs using any filtering technique, a Content-based filtering(CBF: The user will be recom- model for each user is required which must reflect the user mended items similar to the ones he preferred in the preferences and taste. The key success of any filtering tech- past. Example of such systems is News Weeder(Lang, nique comes from the way it learns user models. The CBF 1995) learns the model from the features describing the items the Collaborative filtering (CF): The user will be recom- user has rated in the past, whereas CF learns the model mended items people with similar tastes and preferences from ratings of the items themselves( Breese et al., 1998; liked in the past. GroupLens(Resnick et al., 1994), Resnick et al., 1994; Vozalis Margaritis, 2003 ) In real MovieLens(Miller et al., 2003), and Ringo( Shardanand life users put some priorities for each feature and these pri- Maes, 1995) are some examples of such systems orities need learning also Ujin and Bentley (2004)used the Hybrid filtering: These techniques combine more than evolutionary search to learn these priorities tailoring the one filtering technique to enhance the performance like recommendation process to the preferences of each individ- Fab(balabanovic Shoham, 1997)and Amazon. com ual user. (Linden et al., 2003). Formally, in CF recommenders, we have a set of users U=u1, u2,.. um rating a set of items S=s1, 52, .. Sn) The main shortcomings of CBF are content limitations, such as books, movies, or CDs. The spaces S and U are only a very shallow analysis of certain kinds of content large and can be very large in some applications. Each use can be supplied, and over-specialization; the user is W; i= 1, 2,., m has rated a subset of items S; Specifically, restricted to see items similar to those already rated by the rating of user ue for item s; j=1, 2,..., n is denoted by him only(Balabanovic Shoham, 1997). These systems rci. All the available ratings are collected in a mxn us are not suitable for dynamic and very large environments, item matrix denoted by R. Four phases are required to where items are millions and inserted in the system fre- form the recommendation task in CF recommenders quently(Adomavicius Tuzhilin, 2005) The great power of CF relative to CBF is its cross-genre (a) Data collection or outside the box'recommendation ability Moreover CF (b) User model formation is completely independent of any machine-readable repre (c) Neighborhood set selection sentation of the items being recommended(Adomavicius (d) Making recommendations tuzhilin, 2005; Burke, 2002). However, CF suffers some weaknesses: new user problem(cold start problem), spar sity, scalability, and loss of neighbor transitivity(Breese 2. 1.1. Data collection et al., 1998; Vozalis Margaritis, 2003 ). Usually each user Generally, three types of data can be collected from rates only a very limited percentage of items, when com- users, demographical data through the registration process, pared to the available total. This leads to sparse user-item explicit ratings for a subset of the available items, and matrices, the set of users cross the set of items, therefore implicit data from the user's online behavior. In order to weak recommendations could be produced because the suc- conduct our experiments we used the original MovieLens cessful neighbors cannot be found dataset(http://www.movielens.umn.edu).Thedatasetcon a scalability problem. Further, relying directly on individ- very good, and 5-excellent numerical scale. Each user has ual items ratings result in a loss of neighbor transitivity. rated at least 20 movies. Simple demographical data such Assume that we have three users ui, uj, and uk. Users ui as age, gender, occupation and zip code are included for and u; correlate highly, also u and uk correlate highly. all users, which are collected when a new user registers Because of transitivity there is a possibility that users u; on the system. The movie title, release date, video release and uk correlate highly too. Such a transitive relationship date, and genre data are given for each movie. The genre is not captured in pure CF, unless users u; and uk have feature specifies if the movie is an action, adventure, ani- rated many common items(Vozalis margaritis, 2003). mation, childrens, comedy, crime, documentary, drama, To remedy the aforementioned weaknesses of CBF and fantasy, film-noir, horror, musical, mystery, romance CF, trust-aware RS(Massa Avesani, 2004 )-recommen- sci-fi, thriller, war, or western. a single movie can belong dations are based only on ratings given by users trusted to more than one genre directly or indirectly by the active user-and hybrid filter ing recommenders( Balabanovic Shoham, 1997: Linden 2.1.2. User model formation et al., 2003; Pazzani, 1999: Shahabi et al., 2001) are pro- The difference between a user profile and a user model posed. A recent comprehensive survey of the state of the lies in the different levels of sophistication. The user profile art in Rs, various limitations of the current generation, is simply a collection of personal information about the the various ways to extend their capabilities, and various user, which can be described as a simple user model
demographic preferred. Lifestyle Finder (Krulwich, 1997) uses demographic groups from marketing research to suggest a range of products and services. • Content-based filtering (CBF): The user will be recommended items similar to the ones he preferred in the past. Example of such systems is NewsWeeder (Lang, 1995). • Collaborative filtering (CF): The user will be recommended items people with similar tastes and preferences liked in the past. GroupLens (Resnick et al., 1994), MovieLens (Miller et al., 2003), and Ringo (Shardanand & Maes, 1995) are some examples of such systems. • Hybrid filtering: These techniques combine more than one filtering technique to enhance the performance like Fab (Balabanovic & Shoham, 1997) and Amazon.com (Linden et al., 2003). The main shortcomings of CBF are content limitations; only a very shallow analysis of certain kinds of content can be supplied, and over-specialization; the user is restricted to see items similar to those already rated by him only (Balabanovic & Shoham, 1997). These systems are not suitable for dynamic and very large environments, where items are millions and inserted in the system frequently (Adomavicius & Tuzhilin, 2005). The great power of CF relative to CBF is its cross-genre or ‘outside the box’ recommendation ability. Moreover CF is completely independent of any machine-readable representation of the items being recommended (Adomavicius & Tuzhilin, 2005; Burke, 2002). However, CF suffers some weaknesses: new user problem (cold start problem), sparsity, scalability, and loss of neighbor transitivity (Breese et al., 1998; Vozalis & Margaritis, 2003). Usually each user rates only a very limited percentage of items, when compared to the available total. This leads to sparse user-item matrices, the set of users cross the set of items, therefore weak recommendations could be produced because the successful neighbors cannot be found. On the other hand, the computational cost of RS grows fast with both the number of users and items giving rise to a scalability problem. Further, relying directly on individual items ratings result in a loss of neighbor transitivity. Assume that we have three users ui, uj, and uk. Users ui and uj correlate highly, also uj and uk correlate highly. Because of transitivity there is a possibility that users ui and uk correlate highly too. Such a transitive relationship is not captured in pure CF, unless users ui and uk have rated many common items (Vozalis & Margaritis, 2003). To remedy the aforementioned weaknesses of CBF and CF, trust-aware RS (Massa & Avesani, 2004) – recommendations are based only on ratings given by users trusted directly or indirectly by the active user – and hybrid filtering recommenders (Balabanovic & Shoham, 1997; Linden et al., 2003; Pazzani, 1999; Shahabi et al., 2001) are proposed. A recent comprehensive survey of the state of the art in RS, various limitations of the current generation, the various ways to extend their capabilities, and various hybridization methods are described by Adomavicius and Tuzhilin (2005) and Burke (2002). Basically to build a RS using any filtering technique, a model for each user is required which must reflect the user preferences and taste. The key success of any filtering technique comes from the way it learns user models. The CBF learns the model from the features describing the items the user has rated in the past, whereas CF learns the model from ratings of the items themselves (Breese et al., 1998; Resnick et al., 1994; Vozalis & Margaritis, 2003). In real life, users put some priorities for each feature and these priorities need learning also. Ujjin and Bentley (2004) used the evolutionary search to learn these priorities tailoring the recommendation process to the preferences of each individual user. Formally, in CF recommenders, we have a set of users U = {u1,u2,...,um} rating a set of items S = {s1,s2,...,sn}, such as books, movies, or CDs. The spaces S and U are large and can be very large in some applications. Each user ui, i = 1, 2,...,m has rated a subset of items Si. Specifically, the rating of user uc for item sj, j = 1, 2,...,n is denoted by rc,j. All the available ratings are collected in a m · n useritem matrix denoted by R. Four phases are required to perform the recommendation task in CF recommenders: (a) Data collection (b) User model formation (c) Neighborhood set selection (d) Making recommendations 2.1.1. Data collection Generally, three types of data can be collected from users, demographical data through the registration process, explicit ratings for a subset of the available items, and implicit data from the user’s online behavior. In order to conduct our experiments we used the original MovieLens dataset (http://www.movielens.umn.edu). The dataset consists of 100,000 ratings, assigned by 943 users on 1682 movies. All ratings follow the 1 – bad, 2 – average, 3 – good, 4 – very good, and 5 – excellent numerical scale. Each user has rated at least 20 movies. Simple demographical data such as age, gender, occupation and zip code are included for all users, which are collected when a new user registers on the system. The movie title, release date, video release date, and genre data are given for each movie. The genre feature specifies if the movie is an action, adventure, animation, children’s, comedy, crime, documentary, drama, fantasy, film-noir, horror, musical, mystery, romance, sci-fi, thriller, war, or western. A single movie can belong to more than one genre. 2.1.2. User model formation The difference between a user profile and a user model lies in the different levels of sophistication. The user profile is simply a collection of personal information about the user, which can be described as a simple user model. 1388 M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399
M. Y.H. Al-Shamri, KK Bharadwaj/ Expert Systems with Applications 35 (2008)1386-1399 Depending on the content and the amount of information tance function gives another way to compute similarity, about the user, which is stored in the user profile, a user can which takes into account multiple features(Uiin Bent- be modeled(Froschl, 2005: Koch, 2000). To be precise, a ley, 2004) user model is a representation of the knowledge and per- sonal characteristics, which the system believes that a user possesses. The profiling information can be elicited from demo- graphical data(e. g, users age, gender, occupation, etc. ) Here xi is the jth feature for the common item Si, N is the user preferences about features of the items(e.g, movie number of features, and ==Sxyl, the cardinality of sxy. title, genre, director, year of release, leading actors, etc. ) Note that a vector of features represents each user there and user ratings on experienced items(e.g, previously seen fore it is written bold in formula(2) ovies)(Massa Avesani, 2004). To build a user model two questions have to be taken into account, what infor- 2.1.4. Making recommendations mation has to be represented in the model and how this In this phase, RS assign a predicted rating to all the information is effectively represented? The first question items seen by the neighborhood set and not by the active is general for all applications while the second is applica- user. The predicted rating, pra.j, indicates the expected tion-dependent(Koch, 2000). An effective, tailored recom- interestingness of the item s; to the user wa, is usually com- mendation depends on the underlying user model, which is puted as an aggregate of the ratings of user's(ua) neighbor in turn used for similarity computations. A representative hood set for the same item user model would appropriately reflect user's tastes, prefer ences and need Most previous work on user model construction relies where C denotes the set of neighbors who have rated item nly on explicit ratings(Breese et al., 1998: Goldberg S;. The most widely used aggregation function is the et al., 1992; Resnick et al., 1994; Shardanand Maes, weighted sum(Adomavicius Tuzhilin, 2005: Breese 1995). However, in real life, the way in which two people et al., 1998; Vozalis Margaritis, 2003 ), which is called are said to be similar is not based solely on whether they also Resnick's prediction formula(Resnick et al., 1994) have close opinions on a specific subject, e.g., movie rat ings, but also on other factors, such as their background pra, =ma+k>d(a,c)x(c-me) (4) and personal details. Therefore, issues such as age, gender, and preferences of movie genres should also be taken into The multiplier k serves as a normalizing factor and is usu- account (Ujin Bentley, 2004) ally selected as k=1/>eld(a, c) and me is the average 2. 1.3. Neighborhood set selection Once user models have been established, the system can match the active user to the available database and a set of neighbors for him needs to be formed and ranked accord ing to a suitable distance function. The size of the neigh- 3. A novel hybrid user model borhood set could be fixed by selecting the top K users or could be variable by selecting the users whose similarity The construction of user models is a key task- the sys value is above a certain threshold(Breese et al., 1998: Voz- tems success will depend to a large extent on the models to alis Margaritis, 2003). Various functions have been used represent the users' actual interests. a pure CF user profile to compute the distance, d(a, c), between users ua and ue in( Goldberg et al., 1992; Resnick et al., 1994)consists of a CF recommenders. The most popular function for mem- vector of items with their ratings continuously augmented ory-based CF is the Pearson correlation coefficient as the user interacts with the system over time. This huge (Resnick et al, 1994), where the distance between two users amount of data needs a very large space and a long pro- is based only on the ratings both users have declared. The cessing time. At query time, searching the entire database Pearson correlation coefficient is given by to find the best set of neighbors is computationally very (rxs -,)(rvs-m, expensive. Eventually, the user will leave the Web site corr(x, y) (1) before the processing completes On the other hand, the actual user preferences cannot always be captured by explicit ratings, some content where Sxy is the set of items rated by both users ux and uy. descriptions of items are required. This problem gets solved Let us call the rs, which uses formula (1) for similarity by hybrid filtering. However, most current hybridization computations as Pearson RS(PRS) methods(Burke, 2002)construct two separate profiles Clearly, formula(1)is not appropriate if other features and implement the online process for each filtering tech are included in the model because it considers only the nique separately. Finally, some merger to give the final common items for both users. The modified Euclidean dis- result is used. What if we construct a model according to
Depending on the content and the amount of information about the user, which is stored in the user profile, a user can be modeled (Froschl, 2005; Koch, 2000). To be precise, a user model is a representation of the knowledge and personal characteristics, which the system believes that a user possesses. The profiling information can be elicited from demographical data (e.g., user’s age, gender, occupation, etc.), user preferences about features of the items (e.g., movie title, genre, director, year of release, leading actors, etc.), and user ratings on experienced items (e.g., previously seen movies) (Massa & Avesani, 2004). To build a user model two questions have to be taken into account, what information has to be represented in the model and how this information is effectively represented? The first question is general for all applications while the second is application-dependent (Koch, 2000). An effective, tailored recommendation depends on the underlying user model, which is in turn used for similarity computations. A representative user model would appropriately reflect user’s tastes, preferences, and needs. Most previous work on user model construction relies only on explicit ratings (Breese et al., 1998; Goldberg et al., 1992; Resnick et al., 1994; Shardanand & Maes, 1995). However, in real life, the way in which two people are said to be similar is not based solely on whether they have close opinions on a specific subject, e.g., movie ratings, but also on other factors, such as their background and personal details. Therefore, issues such as age, gender, and preferences of movie genres should also be taken into account (Ujjin & Bentley, 2004). 2.1.3. Neighborhood set selection Once user models have been established, the system can match the active user to the available database and a set of neighbors for him needs to be formed and ranked according to a suitable distance function. The size of the neighborhood set could be fixed by selecting the top K users or could be variable by selecting the users whose similarity value is above a certain threshold (Breese et al., 1998; Vozalis & Margaritis, 2003). Various functions have been used to compute the distance, d(a, c), between users ua and uc in CF recommenders. The most popular function for memory-based CF is the Pearson correlation coefficient (Resnick et al., 1994), where the distance between two users is based only on the ratings both users have declared. The Pearson correlation coefficient is given by corrðx; yÞ ¼ P s2Sxy ðrx;s mxÞðry;s my Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P s2Sxy ðrx;s mxÞ 2P s2Sxy ðry;s my Þ 2 q ; ð1Þ where Sxy is the set of items rated by both users ux and uy. Let us call the RS, which uses formula (1) for similarity computations as Pearson RS (PRS). Clearly, formula (1) is not appropriate if other features are included in the model because it considers only the common items for both users. The modified Euclidean distance function gives another way to compute similarity, which takes into account multiple features (Ujjin & Bentley, 2004) dðx; yÞ ¼ 1 z Xz i¼1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi XN j¼1 ðxi;j yi;jÞ 2 vuut : ð2Þ Here xi,j is the jth feature for the common item si, N is the number of features, and z = jSxyj, the cardinality of Sxy. Note that a vector of features represents each user therefore it is written bold in formula (2). 2.1.4. Making recommendations In this phase, RS assign a predicted rating to all the items seen by the neighborhood set and not by the active user. The predicted rating, pra,j, indicates the expected interestingness of the item sj to the user ua, is usually computed as an aggregate of the ratings of user’s (ua) neighborhood set for the same item sj pra;j ¼ aggruc2C rc;j; ð3Þ where C denotes the set of neighbors who have rated item sj. The most widely used aggregation function is the weighted sum (Adomavicius & Tuzhilin, 2005; Breese et al., 1998; Vozalis & Margaritis, 2003), which is called also Resnick’s prediction formula (Resnick et al., 1994) pra;j ¼ ma þ k X uc2C dða; cÞðrc;j mcÞ: ð4Þ The multiplier k serves as a normalizing factor and is usually selected as k ¼ 1= P uc2Cjdða; cÞj and mc is the average rating of user uc mc ¼ 1 jScj X s2Sc rc;s: ð5Þ 3. A novel hybrid user model The construction of user models is a key task – the system’s success will depend to a large extent on the models to represent the users’ actual interests. A pure CF user profile (Goldberg et al., 1992; Resnick et al., 1994) consists of a vector of items with their ratings continuously augmented as the user interacts with the system over time. This huge amount of data needs a very large space and a long processing time. At query time, searching the entire database to find the best set of neighbors is computationally very expensive. Eventually, the user will leave the Web site before the processing completes. On the other hand, the actual user preferences cannot always be captured by explicit ratings, some content descriptions of items are required. This problem gets solved by hybrid filtering. However, most current hybridization methods (Burke, 2002) construct two separate profiles and implement the online process for each filtering technique separately. Finally, some merger to give the final result is used. What if we construct a model according to M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399 1389
M. Y.H. Al-Shamri, KK Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 some filtering technique and then apply on the constructed 3. 1. Necessary formulae model another filtering technique? In our approaches only one online filtering process(CF)is needed, while the other The total ratings(TR)of user u; is filtering techniques (CBF and DMF) are used to dense the data by building a compact user model, which is an offline TR(=>ris (6) ocess Further, a sparse user-item matrix causes a scalability Here S; is the set of movies rated by user u The genre rat problem for CF. However, we can overcome this problem ing(GR)(resp. genre frequency(GF) for high rated items to some extent by combining or integrating many informa- of genre G; corresponding to user u; is computed using for- tion sources. In our work, we develop a set of hy brid fea- mula (7)(resp. formula( 8) tures that combines some of the users' and items properties. An example of a hybrid feature(Basu, Hirsh. GR(i,j) Cohen, 1998)would be the set of Movies of the Comedy Genre User x Highly Rated. We derive the hybrid feature GF(i,j)=25(is), kE(3, 4, 5) based on the user's ratings for a set of high rated movies s∈GCS and the content descriptions of the genres corresponding where to this set of movies. The hybrid features are utilized as the basis for formulating a genre interestingness measure S(ris) (GIM). Once we have an appropriate formula for GIM, a user model can be constructed from DMF user profile It is to be noted that only the items having rating r>3 are and GIMs considered, i.e. the items rated as good (3), very good (4), Block diagram of the proposed work is given in Fig. I. or excellent(5). Such items would be called items with high First of all, a hybrid user model is built using the hybrid ratings. Finally, relative genre rating (RGR)(resp. relative features and DMF user profile, thereafter CF recom- genre frequency(RGF)), the ratio of u;'s ratings(resp. fre- mender generates a neighborhood set according to quency) for high rated items of G; to his total ratings(resp model-based CF. Finally the entire database of this set is frequency), is computed as retrieved to recommend items according to memory-based CF. Before going to details, let us present necessary formu- RGR(i, j)=TR( GR(i,j) (10) lae for the Movie Lens dataset and illustrate GIM compu- tations through an example RGF(i,j) GF(i,j TF() where TF(=Si, the cardinality of Si 3.2. Example I CBF USer profile Content For the sake of simplicity, we consider a table(Table 1) Descriptions of only three users who have rated movies belonging to four genres. In columns Gi, i= 1, 2, 3, 4, one(1)indicates the belongingness of a given movie to G, and 0 otherwise Also a non-zero value in the user ratings columns indicates Hybrid Features Table I Data of Example 1 Hybrid User Model Movie Corresponding genres User ratings User.l User.2 User-3 CF Recommender 23456789 0 Set of Neighbors 0 G001111 Database G010010110010 3154000000 003040050 0 Recommendatio o12 1011l0 02002430 Fig. I. Block diagram of the proposed work
some filtering technique and then apply on the constructed model another filtering technique? In our approaches only one online filtering process (CF) is needed, while the other filtering techniques (CBF and DMF) are used to dense the data by building a compact user model, which is an offline process. Further, a sparse user-item matrix causes a scalability problem for CF. However, we can overcome this problem to some extent by combining or integrating many information sources. In our work, we develop a set of hybrid features that combines some of the users’ and items’ properties. An example of a hybrid feature (Basu, Hirsh, & Cohen, 1998) would be the set of Movies of the Comedy Genre User X Highly Rated. We derive the hybrid feature based on the user’s ratings for a set of high rated movies and the content descriptions of the genres corresponding to this set of movies. The hybrid features are utilized as the basis for formulating a genre interestingness measure (GIM). Once we have an appropriate formula for GIM, a user model can be constructed from DMF user profile and GIMs. Block diagram of the proposed work is given in Fig. 1. First of all, a hybrid user model is built using the hybrid features and DMF user profile, thereafter CF recommender generates a neighborhood set according to model-based CF. Finally the entire database of this set is retrieved to recommend items according to memory-based CF. Before going to details, let us present necessary formulae for the MovieLens dataset and illustrate GIM computations through an example. 3.1. Necessary formulae The total ratings (TR) of user ui is TRðiÞ ¼ X s2Si ri;s: ð6Þ Here Si is the set of movies rated by user ui. The genre rating (GR) (resp. genre frequency (GF)) for high rated items of genre Gj corresponding to user ui is computed using formula (7) (resp. formula (8)) GRði;jÞ ¼ X s2GjSi;rP3 ri;s; ð7Þ GFði;jÞ ¼ X s2GjSi dk ðrisÞ; k 2 f3; 4; 5g; ð8Þ where, dk ðri;sÞ ¼ 1; k ¼ ri;s; 0; k 6¼ ri;s: ð9Þ It is to be noted that only the items having rating r P 3 are considered, i.e. the items rated as good (3), very good (4), or excellent (5). Such items would be called items with high ratings. Finally, relative genre rating (RGR) (resp. relative genre frequency (RGF)), the ratio of ui’s ratings (resp. frequency) for high rated items of Gj to his total ratings (resp. frequency), is computed as RGRði;jÞ ¼ GRði;jÞ TRðiÞ ; ð10Þ RGFði;jÞ ¼ GFði;jÞ TFðiÞ ; ð11Þ where TF(i) = jSij, the cardinality of Si. 3.2. Example 1 For the sake of simplicity, we consider a table (Table 1) of only three users who have rated movies belonging to four genres. In columns Gi, i = 1, 2, 3, 4, one (1) indicates the belongingness of a given movie to Gi, and 0 otherwise. Also a non-zero value in the user ratings columns indicates CBF User Profile DMF User Profile Content Descriptions Explicit Ratings Hybrid Features Hybrid User Model CF Recommender Recommendations Set of Neighbors Database Fig. 1. Block diagram of the proposed work. Table 1 Data of Example 1 Movie Corresponding genres User ratings G1 G2 G3 G4 User-1 User-2 User-3 1 1010150 2 1011300 3 1100153 4 0110510 5 011 1 4 0 4 6 0100020 7 1101000 8 001 1 0 0 5 9 0110020 10 1 1 1 0 0 4 1 11 1 1 0 1 0 3 3 12 1 0 0 0 0 0 3 1390 M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399
M.Y.H. Al-Shamri, K.K. Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 1391 that the movie has been rated. and zero indicates that the table 4 movie has not been rated yet. Table 2 lists the ratings asso- Comments on genre interestingness measure according to vario ciated with various genres. It is to be noted that the user formulae rating for a specific movie gets associated with all the gen- GIM User Comments res to which the movie belongs. For example, the entry of RGr I Values reflect the actual order of interests user-3 for G4 is 4, 5, and 3(highlighted) where they are cor- 2 Values reflect the actual order of interests responding to movie-5, movie-8, and movie-ll(Table 1) 3 G4 has the highest value as expected. But G3 and Gn have the same value while there is a difference between them respectively, and so on. As discussed earlier, the hybrid fea tures are used as the basis for the genre interestingnes RGF I G] has the highest value as expected. But G4 and G2 hay easure, therefore a user is interested more in G, if it has the same value while there is a difference between them 2 Values reflect the order of interests more high ratings, i.e. good, very good, or excellent. The 3 G4 has the highest value as expected. But GI and G2 have number of hybrid features depends on the number of gen- the same value also. G3 has smaller value than GI and G? res. On this basis, one can infer that user-1, user-2, and while user's interest is more for user-3 are interested more in G3, G1, and G4, respectively Table 2) is because only the number of high rated items is consid- Although some low ratings (r< 3) occur for genres but these can be filtered out at the collaborative stage. Our ered and all high ratings contributed equally in this aim here is to find a compact user model to generate close eighbors for that user. This set of neighbors is used in a are given in Table 4. Obviously, RGR and RGF are appro- collaborative manner to recommend some items. which priate GIM formulae with some weaknesses. To remedy might be liked by the user. Therefore any item with a pre- these weaknesses we introduce in the next subsection a for- dicted rating less than good (3)cannot be recommended to mula that combines both RGR and a modified version of the active user RGF Based on the hybrid features, one can evolve GIM ccording to a suitable formula. To explore the possibility 3.3. The proposed hybrid user model of choosing a formula for GIM, we examined formulae (10) and (11). The first formula, RGR, works well for User preference for a certain item is declared by the rat- user-Iand user-2 but for user-3 it gives Gi and G3 the same ing for that item like good, very good,or excellent.The value(Table 3), whereas they are quite different. G has three good(3)ratings while G3 has very good (4)and excel exact degree of preference is not captured by RGF, because lent(5)ratings. This is because the number of movies for it gives all high ratings the same weight. The following def- inition introduces a modified version of rgf formula each genre is ignored in this formula. The second formula, which tries to reflect the exact preference for items with RGF, works well for user-2 but for user-3 it gives G1, G2 high ratings and G4 the same value, whereas they are quite different Moreover, G3 value is lower than that of Gl, and G,, Definition 1. For a rating-based movie recommender whereas the ratings for G3 are very good and excellent. This system, the modified relative genre frequency (MRGF) of genre G for user ui is defined as Table 2 List of ratings associated with various genres MRGF()=216s)+2×b()+3×5() 3×7F() 3,11,5,4 1.3.5.43.4 5,5,4,35,1,2,2,4,35,1,2,43 As expected, MRGF ( Table 3)overcomes the draw- 1,3,33,4,1, 3 4,5,I4, 5,3 backs of RGF. To develop more accurate Gim, the rela tive genre rating needs to be considered also. Indeed RGR identifies the preferences for genres with some draw- Table 3 backs as discussed before. However. a combined formula Genre interestingness measure according to various formulae of RGR and mRGf will bring out the best in both formu- lae. The following definition gives one possible form of 0.643 0.857 0.500 such a formula 123123 0.773 0.545 0.136 0.474 0.526 0.632 Definition 2. For a rating-based movie recommender 0.200 0.400 0.600 0.400 system, the genre interestingness measure(GIM)of genre 0.286 0.143 Gi for user li is defined as 0.500 0.067 0.333 0.400 2×nf×RGR(i,)×MRGF(,j 0.200 GIM(i,j 0.429 0.286 0.238 0.048 RGR(i,+ MRGF(i, j) 0.167 0.222 0.278 0.333 Here nf is the normalization factor for a given system
that the movie has been rated, and zero indicates that the movie has not been rated yet. Table 2 lists the ratings associated with various genres. It is to be noted that the user rating for a specific movie gets associated with all the genres to which the movie belongs. For example, the entry of user-3 for G4 is 4, 5, and 3 (highlighted) where they are corresponding to movie-5, movie-8, and movie-11 (Table 1), respectively, and so on. As discussed earlier, the hybrid features are used as the basis for the genre interestingness measure, therefore a user is interested more in Gi if it has more high ratings, i.e. good, very good, or excellent. The number of hybrid features depends on the number of genres. On this basis, one can infer that user-1, user-2, and user-3 are interested more in G3, G1, and G4, respectively (Table 2). Although some low ratings (r < 3) occur for genres but these can be filtered out at the collaborative stage. Our aim here is to find a compact user model to generate close neighbors for that user. This set of neighbors is used in a collaborative manner to recommend some items, which might be liked by the user. Therefore any item with a predicted rating less than good (3) cannot be recommended to the active user. Based on the hybrid features, one can evolve GIMs according to a suitable formula. To explore the possibility of choosing a formula for GIM, we examined formulae (10) and (11). The first formula, RGR, works well for user-1 and user-2 but for user-3 it gives G1 and G3 the same value (Table 3), whereas they are quite different. G1 has three good (3) ratings while G3 has very good (4) and excellent (5) ratings. This is because the number of movies for each genre is ignored in this formula. The second formula, RGF, works well for user-2 but for user-3 it gives G1, G2 and G4 the same value, whereas they are quite different. Moreover, G3 value is lower than that of G1, and G2, whereas the ratings for G3 are very good and excellent. This is because only the number of high rated items is considered and all high ratings contributed equally in this formula. Comments on all the GIM formulae for the three users are given in Table 4. Obviously, RGR and RGF are appropriate GIM formulae with some weaknesses. To remedy these weaknesses we introduce in the next subsection a formula that combines both RGR and a modified version of RGF. 3.3. The proposed hybrid user model User preference for a certain item is declared by the rating for that item like good, very good, or excellent. The exact degree of preference is not captured by RGF, because it gives all high ratings the same weight. The following definition introduces a modified version of RGF formula, which tries to reflect the exact preference for items with high ratings. Definition 1. For a rating-based movie recommender system, the modified relative genre frequency (MRGF) of genre Gj for user ui is defined as MRGFði;jÞ ¼ P s2GjSi d3ðri;sÞ þ 2 d4ðri;sÞ þ 3 d5ðri;sÞ 3 TF ðiÞ : ð12Þ As expected, MRGF (Table 3) overcomes the drawbacks of RGF. To develop more accurate GIM, the relative genre rating needs to be considered also. Indeed, RGR identifies the preferences for genres with some drawbacks as discussed before. However, a combined formula of RGR and MRGF will bring out the best in both formulae. The following definition gives one possible form of such a formula. Definition 2. For a rating-based movie recommender system, the genre interestingness measure (GIM) of genre Gj for user ui is defined as GIMði;jÞ ¼ 2 nf RGRði;jÞ MRGFði;jÞ RGRði;jÞ þ MRGFði;jÞ : ð13Þ Here nf is the normalization factor for a given system. Table 2 List of ratings associated with various genres User TF TR G1 G2 G3 G4 1 5 14 1, 3, 1 1, 5, 4 1, 3, 5, 4 3, 4 2 7 22 5, 5, 4, 3 5, 1, 2, 2, 4, 3 5, 1, 2, 4 3 3 6 19 3, 1, 3, 3 3, 4, 1, 3 4, 5, 1 4, 5, 3 Table 3 Genre interestingness measure according to various formulae GIM User G1 G2 G3 G4 RGR 1 0.214 0.643 0.857 0.500 2 0.773 0.545 0.409 0.136 3 0.474 0.526 0.474 0.632 RGF 1 0.200 0.400 0.600 0.400 2 0.571 0.429 0.286 0.143 3 0.500 0.500 0.333 0.500 MRGF 1 0.067 0.333 0.400 0.200 2 0.429 0.286 0.238 0.048 3 0.167 0.222 0.278 0.333 Table 4 Comments on genre interestingness measure according to various formulae GIM User Comments RGR 1 Values reflect the actual order of interests 2 Values reflect the actual order of interests 3 G4 has the highest value as expected. But G3 and G1 have the same value while there is a difference between them RGF 1 G3 has the highest value as expected. But G4 and G2 have the same value while there is a difference between them 2 Values reflect the order of interests 3 G4 has the highest value as expected. But G1 and G2 have the same value also. G3 has smaller value than G1 and G2 while user’s interest is more for G3 M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399 1391
M. Y.H. Al-Shamri, KK Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 Table 5 taken into account when doing the inter-user comparisons. Genre interestingness measure ing to formula(13) In this subsection, our aim is to fuzzify the proposed model GIM(i, to get as close as possible a set of neighbors for the active G G G a user. First of all, age is fuzzified into three fuzzy sets(Klir 0.510 2.194 2.727 L.429 Yuan, 1995), young, middle-aged and old as shown in 2.759 1876 0.355 Fig. 2, with the following membership functions 1235 1561 2.181 x≤20, Ayoung(x)={(35-x)/152035 MRGF(i,) multiplied by nf. The range of GIM(i)) is [O, MAX] that x≤20,x>60, e. 1-bad..MAX-excellent. Here MAX is the maxi- AMiddle(x) (x-20)/15.2060 G3, user-2 in Gl, and user-3 in G4. Three main advantages The gender and occupation values are considered as fuzzy preferences in genres are captured easily. Secondly, a com- points with membership value of one. Finally, the genre neighbor transitivity therefore users having a close GIMs (VG), and excellent(E)with the following membership GIM represents user's X interestingness for genre y functions(Fig 3) However, as noted earlier that two people are said to be ≤1 similar is not based solely on whether they have close opin- (15a) ions on a specific subject, but also on other factors, such as their background and personal details. Therefore, we con x≤i-2,x>i, struct the user model from DMF user profile and GIMS. Bao(x)=r-i+2, i-2<x<i-l, i=2, 3, 4, 5 The proposed user model consists of age, gender, and occu i-1<x≤i, pation as demographical data and GIMs for 18 genres (15b) 4. Fuzzy and hybrid fuzzy-genetic approaches Here a(i=B, AV, G, and VG for i= 2, 3, 4, and 5 respectively 4. 1. Fuzzy approach to recommender systems x≤4 BE(x) x-4,4<x≤5 (15c) Fuzzy sets were introduced by Zadeh(1965) as a gener- alization of the classical crisp sets in order to deal with ague concepts,like" young”,“rich”,“talr;, and so on.4.l.2. Fuzzy distance function Instead of the hard belongingness of the items to the crisp A great advantage can be gained by fuzzifying the user set(I if the item belongs to the set, and 0 otherwise), fuzzy model, but how can we compare two models having many set allows items to have partial degree of belongingness, i.e. features? In turn, each feature has many fuzzy sets. Actu any value in the interval [0, 1]. To develop a fuzzy rs, the ally, the choice of a distance function is an important issue user model needs to be fuzzified and an appropriate dis- for the system and depends heavily on the problem itsel tance function needs to be proposed to match different users with fuzzy models. The following subsections discuss how to fuzzify the model and introduce a fuzzy distance function 4. 1.1. User model fuzzification 0.5 The crisp description of the age and genre interesting less measure does not reflect the actual case for human decisions. The distance, for example, between two users of age 15 and 19 is 4 while both users belong to the same age group, i.e. teenager. These fuzzy features need to be Fig. 2. Membership functions for age
Formula (13) gives the harmonic mean of RGR(i,j), and MRGF(i,j) multiplied by nf. The range of GIM(i,j) is [0,MAX] that agrees with the system’s rating structure, i.e. 1 – bad...MAX – excellent. Here MAX is the maximum possible rating for a given system. The normalization factor nf could take the value of MAX or the global average rating of a given user (TR(i)/TF(i)). The results based on formula (13) with nf = MAX are shown in Table 5. The results reflect exactly the degree of interestingness for each genre; user-1 is interested more in G3, user-2 in G1, and user-3 in G4. Three main advantages are gained by using formula (13) for GIM. Firstly, the user preferences in genres are captured easily. Secondly, a compact user model is obtained. Finally, GIM ensures the user neighbor transitivity therefore users having a close GIMs are correlated. GIM represents user’s X interestingness for genre Y. However, as noted earlier that two people are said to be similar is not based solely on whether they have close opinions on a specific subject, but also on other factors, such as their background and personal details. Therefore, we construct the user model from DMF user profile and GIMs. The proposed user model consists of age, gender, and occupation as demographical data and GIMs for 18 genres. 4. Fuzzy and hybrid fuzzy-genetic approaches 4.1. Fuzzy approach to recommender systems Fuzzy sets were introduced by Zadeh (1965) as a generalization of the classical crisp sets in order to deal with vague concepts, like ‘‘young’’, ‘‘rich’’, ‘‘tall’’, and so on. Instead of the hard belongingness of the items to the crisp set (1 if the item belongs to the set, and 0 otherwise), fuzzy set allows items to have partial degree of belongingness, i.e. any value in the interval [0, 1]. To develop a fuzzy RS, the user model needs to be fuzzified and an appropriate distance function needs to be proposed to match different users with fuzzy models. The following subsections discuss how to fuzzify the model and introduce a fuzzy distance function. 4.1.1. User model fuzzification The crisp description of the age and genre interestingness measure does not reflect the actual case for human decisions. The distance, for example, between two users of age 15 and 19 is 4 while both users belong to the same age group, i.e. teenager. These fuzzy features need to be taken into account when doing the inter-user comparisons. In this subsection, our aim is to fuzzify the proposed model to get as close as possible a set of neighbors for the active user. First of all, age is fuzzified into three fuzzy sets (Klir & Yuan, 1995), young, middle-aged and old as shown in Fig. 2, with the following membership functions: AYoungðxÞ ¼ 1; x 6 20; ð35 xÞ=15; 20 35; 8 >: ð14aÞ AMiddleðxÞ ¼ 0; x 6 20; x > 60; ðx 20Þ=15; 20 >>>>: ð14bÞ AOldðxÞ ¼ 0; x 6 45; ðx 45Þ=15; 45 60: 8 >: ð14cÞ The gender and occupation values are considered as fuzzy points with membership value of one. Finally, the genre interestingness measure is fuzzified into six fuzzy sets, very bad (VB), bad (B), average (AV), good (G), very good (VG), and excellent (E) with the following membership functions (Fig. 3): BVBðxÞ ¼ 1 x; x 6 1; 0; x > 1; ð15aÞ BaðiÞðxÞ ¼ 0; x 6 i 2; x > i; x i þ 2; i 2 : i ¼ 2; 3; 4; 5: ð15bÞ Here a(i) = B,AV,G, and VG for i = 2, 3, 4, and 5, respectively. BEðxÞ ¼ 0; x 6 4; x 4; 4 < x 6 5: ð15cÞ 4.1.2. Fuzzy distance function A great advantage can be gained by fuzzifying the user model, but how can we compare two models having many features? In turn, each feature has many fuzzy sets. Actually, the choice of a distance function is an important issue for the system and depends heavily on the problem itself. Table 5 Genre interestingness measure according to formula (13) User GIM(i,j) G1 G2 G3 G4 1 0.510 2.194 2.727 1.429 2 2.759 1.876 1.505 0.355 3 1.235 1.561 1.752 2.181 0 0.5 1 1.5 0 20 40 60 80 Age Membership value Fig. 2. Membership functions for age. 1392 M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399
M.Y.H. Al-Shamri, K.K. Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 1393 same fuzzy set. In the following, we define an appropriate fuzzy distance function that satisfies all the above four conditions g0. Definition 3. Let a and b be the membership vectors correspond to two crisp values a, and b for a given feature 0 with I fuzzy sets. The fuzzy distance between a and b is defined as Fig. 3. Membership functions for genre interestingness measure. fd(a, b=d(a, b)xd(a, b) where d(a, b) is simply the difference operator, a and b are The distance between two fuzzy sets or points is discussed vectors of size L, and d(a, b )is any vector distance metric extensively in the literature(Dumitrescu, Lazzerini, Jair 2000: Klir& Yuan, 1995). For our model, a vector of 21 In our experiments, the Euclidean distance function is features represents the user. A local fuzzy distance should used for d(a, b) be found for each feature. Hence, for each pair of users, we have 21 local fuzzy distances. The global fuzzy distance b)=∑(a-b) is obtained by two methods(Gadi, Daoudi, Matusiak 999). The first method uses a fuzzy IF-THEN rule of where a is the membership value of the feature a in its jth Cae F(x, is close to y1)and (x2 is close to y2) .and(x21 is fuzzy set. To illustrate the effectiveness of the proposed fuzzy se to y21)THEN (x is similar to y). distance function, let us consider the following example In this case, the global fuzzy distance is given by Example 2. Suppose we have to compute fuzzy distance fd(x,y)=min(fd(x1, yi), fd(x2,y2),., fd(x2l, y21)) .(16) between two users having the age The second method considers each fuzzy distance between two features as an opinion. The global fuzzy distance is a Case i: 35 and 40, global opinion from all. In fuzzy logic, this requires an Case ii: 45 and 60, aggregation operator. The aggregation operator may Case iii: 18 and 23 the average of the 21 local fuzzy distances According to Fig. 2, 21 (17) Case i: a=(0, 1, 0), b=(0, 1, 0). Therefore For our model, formula(16)does not work well because it considers only the feature with the minimum distance and d(a, b)=V2x(0-0)+(1-1)=0, ignores the remaining features. According to fuzzy sets and distance concepts, we need a local fuzzy distance metric, fd(xi, yi), that fulfils the following conditions (A)a zero value for the same feature values Case ii: a=(0, 1, 0), b=(0, 0, 1). Therefore (B)A zero value for different feature values in the same d(a, b) fuzzy set having the same membership values (1-0)2 C) Minimize the distance between any two feature values d(60, 45)=60-45=15 belonging to the same fuzzy set having near member ship values. fd(60,45)=√2×15( Far users) (D) Maximize the distance between any two feature val es belonging to two different fuzzy sets Case ii: a=(1,0, 0), b=(0.8, 0.2, 0). Therefore The condition (A)is the basic requirement for any dis- d(a, b) (08-1)2+(02-0)2=0283, ance function. To clarify the condition(B), assume that d (23, 18)=23-18=5, we have two users with age 40 and 35. Both users are mid dle-aged with membership values of one(Fig. 2). The crisp fd(23,18)=0.283×5( Near users) distance between them is 5, whereas they are similar users from the fuzzy sets point of view. To make the distance Results in Example 2 show that formula(18)satisfies all between the two users zero, we need another term gives the four conditions of the required local fuzzy distance zero value for this and similar cases. What makes these metric. Accordingly, for the fuzzy approach, the fuzzy dis two users similar is their equal membership values in the tance function between two users can be aggregated using
The distance between two fuzzy sets or points is discussed extensively in the literature (Dumitrescu, Lazzerini, & Jain, 2000; Klir & Yuan, 1995). For our model, a vector of 21 features represents the user. A local fuzzy distance should be found for each feature. Hence, for each pair of users, we have 21 local fuzzy distances. The global fuzzy distance is obtained by two methods (Gadi, Daoudi, & Matusiak, 1999). The first method uses a fuzzy IF-THEN rule of the form: IF (x1 is close to y1) and (x2 is close to y2)... and (x21 is close to y21) THEN (x is similar to y). In this case, the global fuzzy distance is given by fdðx; yÞ ¼ minffdðx1; y1Þ; fdðx2; y2Þ; ... ; fdðx21; y21Þg: ð16Þ The second method considers each fuzzy distance between two features as an opinion. The global fuzzy distance is a global opinion from all. In fuzzy logic, this requires an aggregation operator. The aggregation operator may be the average of the 21 local fuzzy distances: fdðx; yÞ ¼ 1 21 X 21 i¼1 fdðxi; yi Þ: ð17Þ For our model, formula (16) does not work well because it considers only the feature with the minimum distance and ignores the remaining features. According to fuzzy sets and distance concepts, we need a local fuzzy distance metric, fd(xi,yi), that fulfils the following conditions: (A) A zero value for the same feature values. (B) A zero value for different feature values in the same fuzzy set having the same membership values. (C) Minimize the distance between any two feature values belonging to the same fuzzy set having near membership values. (D) Maximize the distance between any two feature values belonging to two different fuzzy sets. The condition (A) is the basic requirement for any distance function. To clarify the condition (B), assume that we have two users with age 40 and 35. Both users are middle-aged with membership values of one (Fig. 2). The crisp distance between them is 5, whereas they are similar users from the fuzzy sets point of view. To make the distance between the two users zero, we need another term gives zero value for this and similar cases. What makes these two users similar is their equal membership values in the same fuzzy set. In the following, we define an appropriate fuzzy distance function that satisfies all the above four conditions. Definition 3. Let a and b be the membership vectors correspond to two crisp values a, and b for a given feature with l fuzzy sets. The fuzzy distance between a and b is defined as fdða; bÞ ¼ dða; bÞ dða; bÞ; ð18Þ where d(a,b) is simply the difference operator, a and b are vectors of size l, and d(a,b) is any vector distance metric. In our experiments, the Euclidean distance function is used for d(a,b): dða; bÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X l j¼1 ðaj bjÞ 2 vuut ; ð19Þ where aj is the membership value of the feature a in its jth fuzzy set. To illustrate the effectiveness of the proposed fuzzy distance function, let us consider the following example. Example 2. Suppose we have to compute fuzzy distance between two users having the age: Case i: 35 and 40, Case ii: 45 and 60, Case iii: 18 and 23. According to Fig. 2, Case i: a = h0, 1, 0i, b = h0, 1, 0i. Therefore dða; bÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ð0 0Þ 2 þ ð1 1Þ 2 q ¼ 0; dð35; 40Þ ¼ 40 35 ¼ 5; fdð35; 40Þ ¼ 0 5 ¼ 0 ðSimilar usersÞ: Case ii: a = h0, 1, 0i, b = h0, 0, 1i. Therefore dða; bÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð1 0Þ 2 þ ð0 1Þ 2 q ¼ ffiffiffi 2 p ; dð60; 45Þ ¼ 60 45 ¼ 15; fdð60; 45Þ ¼ ffiffiffi 2 p 15 ðFar usersÞ: Case iii: a = h1, 0, 0i, b = h0.8, 0.2, 0i. Therefore dða; bÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð0:8 1Þ 2 þ ð0:2 0Þ 2 q ¼ 0:283; dð23; 18Þ ¼ 23 18 ¼ 5; fdð23; 18Þ ¼ 0:283 5 ðNear usersÞ: Results in Example 2 show that formula (18) satisfies all the four conditions of the required local fuzzy distance metric. Accordingly, for the fuzzy approach, the fuzzy distance function between two users can be aggregated using 0 0.5 1 1.5 012345 Genre interestingness measure Membership value Fig. 3. Membership functions for genre interestingness measure. M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399 1393
M. Y.H. Al-Shamri, KK Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 formula(17)or the Euclidean distance function as giver The feature weights of user ua are represented as a set of weights, weight(ua)=[wili=l.. where n is the number of features. The genotype of wi is a string of binary values. (d(x,y)2 When the weight for any feature is zero, that feature is (20) ignored. This enables feature selection to be adaptive to each user's preference where fd(x ,yi)=d(x yi)xd(xi,yi), d(x, yi)= VE(x-y,), I is the total number of fuzzy sets for 4. 2.2. The fimess fimction Finding an appropriate fitness function is a challeng the ith feature, and xi, is the membership value of the ith problem for GA applications( Goldberg, 1989). For this feature in the jth fuzzy set. Let us call the rs uses formula application it is not a trivial task, every set of weights in (20) for similarity computations as Fuzzy RS(Frs) the Ga population must be employed by the users match ing process within the rs so the rs needs to be re-run on 4.2.Hybrid fuzzy-genetic approach to recommender systems the entire database for each new set of weights in order to find the fitness(Ujin Bentley, 2004). A poor(good)set The proposed user model contains 21 features contrib- of weights might( should) result in a poor(good) neigh ferent weights to different features. These weights are sub- tion is by reformulating the problem as a supervised lea? w uting equally during similarity computations in FRS. This borhood set of users for the active user, and hence poor does not capture the real life situation where users put dif- (good)recommendations. One way to find the fitness fund ject to change with time and evolving preferences of each ing task. For that purpose the actual ratings of the active user.An efficient learning mechanism to capture these user are randomly divided into two disjoint sets, test rat- weights is required. By imposing features hts to for- ings set(66%)and training rating set(34%). To find the mula(20), genetic algorithm(GA)can be used to find fitness score for the evolved set of weights, the RS must these weights leac ading to a hybrid fuzzy-genetic Rs be operated and the predicted ratings for each movie in FGRS). For this approach formula(20)takes the follow- the training ratings set must be computed. The average g form: of the differences (Uiin Bentley, 2004) between the actual and predicted ratings of all movies in the training ratings set is used as the fitness score for that set of dxy)=1∑吗x(dx,y) (21)weights subsections give a brief int and the fitness function where nR is the training ratings set cardinality of a given ac- live user 4.2.1 Genetic algorithms A genetic algorithm processes a population of compet- 5. E ing candidate solutions. The chromosome decodes a set of tuned parameters mapped into a potential solution to Based on Movie Lens dataset we considered only users an optimization problem( Goldberg, 1989). An objective who have rated at least 60 movies, 20 to build a user function evaluates the quality of a solution, which is called model and 40 for testing. Out of 943 users only 497 users a fitness function. Genetic operators such as crossover and satisfied this condition and contributed 84, 596 ratings out mutation are applied to the parents in order to generate of 100,000. This dataset is used as the basis to generate new offspring. Individuals that achieve a high fitness are five random splits into training and active users. For each more likely to be selected as parents and generate offspring random split, 50 users were chosen randomly as active by means of crossover and mutation. The good newly gen- users, and the remaining 447 users as training users-trea erated individuals(if exist) replace the current bad individ- ted as historical data for the Rs. Such a random uals to form the new population for the next generation. separation was intended for the execution of five-fold Ga terminates either if a maximum number of generations cross-validation, where all the experiments are repeated elapses or a desired level of fitness is reached(Goldberg, five times, once with each split. These splits will be lit-5. The set of trainin which can be referred to as feature weight. In order to users (447 users) is used to find a set of neighbors for the implement truly personalized RS, these weights need to active user while the set of active users(50 users)is used to be captured and fine-tuned to reflect each user's preference. test the performance of the system. During the testing In our experiment, GA adapts the feature weights to cap- phase, each active user's ratings are divided randomly into ture the user priorities for different features(Ujin Bent- two disjoint sets, training ratings(34%)and test ratings ley,2004) (66%). The training ratings are used to model the user
formula (17) or the Euclidean distance function as given below: fdðx; yÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 21 i¼1 ðfdðxi; yi ÞÞ2 vuut ; ð20Þ where fd(xi,yi) = d(xi,yi) · d(xi,yi), dðxi; yi Þ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pl j¼1ðxi;j yi;jÞ 2 q , l is the total number of fuzzy sets for the ith feature, and xi,j is the membership value of the ith feature in the jth fuzzy set. Let us call the RS uses formula (20) for similarity computations as Fuzzy RS (FRS). 4.2. Hybrid fuzzy-genetic approach to recommender systems The proposed user model contains 21 features contributing equally during similarity computations in FRS. This does not capture the real life situation where users put different weights to different features. These weights are subject to change with time and evolving preferences of each user. An efficient learning mechanism to capture these weights is required. By imposing features’ weights to formula (20), genetic algorithm (GA) can be used to find these weights leading to a hybrid fuzzy-genetic RS (FGRS). For this approach formula (20) takes the following form: fdðx; yÞ ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X 21 j¼1 wj ðfdðxj; yjÞ vuut Þ 2 ; ð21Þ where wj is the weight for the jth feature. The following subsections give a brief introduction to genetic algorithms and the fitness function. 4.2.1. Genetic algorithms A genetic algorithm processes a population of competing candidate solutions. The chromosome decodes a set of tuned parameters mapped into a potential solution to an optimization problem (Goldberg, 1989). An objective function evaluates the quality of a solution, which is called a fitness function. Genetic operators such as crossover and mutation are applied to the parents in order to generate new offspring. Individuals that achieve a high fitness are more likely to be selected as parents and generate offspring by means of crossover and mutation. The good newly generated individuals (if exist) replace the current bad individuals to form the new population for the next generation. GA terminates either if a maximum number of generations elapses or a desired level of fitness is reached (Goldberg, 1989). Every user puts a different priority on each feature, which can be referred to as feature weight. In order to implement truly personalized RS, these weights need to be captured and fine-tuned to reflect each user’s preference. In our experiment, GA adapts the feature weights to capture the user priorities for different features (Ujjin & Bentley, 2004). The feature weights of user ua are represented as a set of weights, weight(ua)=[wi]i=1,...,n, where n is the number of features. The genotype of wi is a string of binary values. When the weight for any feature is zero, that feature is ignored. This enables feature selection to be adaptive to each user’s preference. 4.2.2. The fitness function Finding an appropriate fitness function is a challenging problem for GA applications (Goldberg, 1989). For this application it is not a trivial task, every set of weights in the GA population must be employed by the users matching process within the RS so the RS needs to be re-run on the entire database for each new set of weights in order to find the fitness (Ujjin & Bentley, 2004). A poor (good) set of weights might (should) result in a poor (good) neighborhood set of users for the active user, and hence poor (good) recommendations. One way to find the fitness function is by reformulating the problem as a supervised learning task. For that purpose the actual ratings of the active user are randomly divided into two disjoint sets, test ratings set (66%) and training rating set (34%). To find the fitness score for the evolved set of weights, the RS must be operated and the predicted ratings for each movie in the training ratings set must be computed. The average of the differences (Ujjin & Bentley, 2004) between the actual and predicted ratings of all movies in the training ratings set is used as the fitness score for that set of weights fitness ¼ 1 nR XnR j¼0 jrj prjj; ð22Þ where nR is the training ratings set cardinality of a given active user. 5. Experiments Based on MovieLens dataset we considered only users who have rated at least 60 movies, 20 to build a user model and 40 for testing. Out of 943 users only 497 users satisfied this condition and contributed 84,596 ratings out of 100,000. This dataset is used as the basis to generate five random splits into training and active users. For each random split, 50 users were chosen randomly as active users, and the remaining 447 users as training users – treated as historical data for the RS. Such a random separation was intended for the execution of five-fold cross-validation, where all the experiments are repeated five times, once with each split. These splits will be referred to as split-1, split-2,..., split-5. The set of training users (447 users) is used to find a set of neighbors for the active user while the set of active users (50 users) is used to test the performance of the system. During the testing phase, each active user’s ratings are divided randomly into two disjoint sets, training ratings (34%) and test ratings (66%). The training ratings are used to model the user 1394 M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399
M.Y.H. Al-Shamri, K.K. Bharadwaj/ Expert Systems with Applications 35(2008)1386-1399 and to supervise the learning process of FGRS, whereas 5.1. Experiment I the test ratings are treated as unseen ratings that the sys- tem would attempt to predict In this experiment we run the proposed FRS and com- Two evaluation metrics are used to evaluate the effec- pare its results with classical PRS. For PRS experiment, tiveness of different RS, the mean absolute error (MAE), if the correlation coefficient is negative, then users ux and and the total coverage of the system. The MAE measures ly are negatively correlated. This means that each user dis- the deviation of predictions generated by the rs from the courages the other. In our experiments, we follow the most true ratings specified by the user. The MAE() for active used strategy of keeping only positive correlation values user u; ( Breese et al., 1998; Vozalis Margaritis, 2003 )is because users with a negative correlation are dissimilar to given by the following formula the active user and hence it is better not to consider their ratings(Massa Avesani, 2004). The neighborhood set size is kept 30 for all the experiments MAE(O) lpr. (23) Usually, authors run PRS for a selected set of users only but in this experiment we run PRs over entire training users' database, even if it is time consuming. The difference where ni is the cardinality of the test ratings set of The total mae over all the active users, Nr(in our between gender(occupation) values is either 0, if the two users have the same gender(occupation) or I otherwise iments Nr= 50) can be calculated as This agrees with our reasoning for making opposite values MAE I as far as possible. Moreover, some sort of normalizatic MaE(O (24)(Han& Kamber, 2001)is used for age values to ensure that the values fall within the same range of GIM range, i.e. Lower MAE corresponds to more accurate predictions of a [0, 5]. Each age value is multiplied by(5/80), where age given RS. On the other hand, coverage is the measure of 80 is assumed to be the maximum possible age the percentage of items for which a RS can provide predic- The system picks the movies, from the test ratings set of tions. The RS may not be able to make predictions for the active user, one by one. Thereafter predicts ratings for every item. Low coverage value indicates that the RS will them using formula(4) over the set of all neighbors who not be able to assist the user with many of the items he have rated the same movie. After getting the predicted rat has not rated( Breese et al., 1998: Massa Avesani, ings, the system compares them with the actual ratings 2004). We compute coverage as the percentage of items given by the active user. Figs. 4 and 5 show the correct pre- over all users for which a prediction was requested and diction percentage obtained from PRS and FRS for the the system was able to produce a prediction(Vozalis& fifty active users of split-I(best split)and split-3(worst Margaritis, 2003) split), respectively. Each graph shows the percentage of the number of ratings that the system predicted correctly Coverage =siIP out of the total number of available test ratings by the lere, Pi is the total number of predicted items for active user Il; 5.1.1. Analysis of the results In the following two experiments, all the five splits of The implementation of PRS for 50 active users took data are used to show the effectiveness of the proposed user around 13 16 min, while it was around 2. 13 min only for model, fuzzy and hybrid fuzzy-genetic approaches to Rs. FRS for the same set of active users. The computational All experiments are conducted on a PC with 2.66 GHz Intel time complexity is reduced by approximately a factor of Pentium 4) CPU, 256MB RAM six with the proposed user model. Results summarized 830 20 s9s88NR588于导兽导 Active User ■FRs 口FGRs Fig 4. Correct prediction centage for active users of split-1
and to supervise the learning process of FGRS, whereas the test ratings are treated as unseen ratings that the system would attempt to predict. Two evaluation metrics are used to evaluate the effectiveness of different RS, the mean absolute error (MAE), and the total coverage of the system. The MAE measures the deviation of predictions generated by the RS from the true ratings specified by the user. The MAE(i) for active user ui (Breese et al., 1998; Vozalis & Margaritis, 2003) is given by the following formula: MAEðiÞ ¼ 1 ni Xni j¼1 jpri;j ri;jj; ð23Þ where ni is the cardinality of the test ratings set of user ui. The total MAE over all the active users, NT (in our experiments NT = 50) can be calculated as MAE ¼ 1 NT XNT i¼1 MAEðiÞ: ð24Þ Lower MAE corresponds to more accurate predictions of a given RS. On the other hand, coverage is the measure of the percentage of items for which a RS can provide predictions. The RS may not be able to make predictions for every item. Low coverage value indicates that the RS will not be able to assist the user with many of the items he has not rated (Breese et al., 1998; Massa & Avesani, 2004). We compute coverage as the percentage of items over all users for which a prediction was requested and the system was able to produce a prediction (Vozalis & Margaritis, 2003) Coverage ¼ PNT i¼1 P pi NT i¼1ni : ð25Þ Here, pi is the total number of predicted items for active user ui. In the following two experiments, all the five splits of data are used to show the effectiveness of the proposed user model, fuzzy and hybrid fuzzy-genetic approaches to RS. All experiments are conducted on a PC with 2.66 GHz Intel (Pentium 4) CPU, 256MB RAM. 5.1. Experiment 1 In this experiment we run the proposed FRS and compare its results with classical PRS. For PRS experiment, if the correlation coefficient is negative, then users ux and uy are negatively correlated. This means that each user discourages the other. In our experiments, we follow the most used strategy of keeping only positive correlation values because users with a negative correlation are dissimilar to the active user and hence it is better not to consider their ratings (Massa & Avesani, 2004). The neighborhood set size is kept 30 for all the experiments. Usually, authors run PRS for a selected set of users only but in this experiment we run PRS over entire training users’ database, even if it is time consuming. The difference between gender (occupation) values is either 0, if the two users have the same gender (occupation) or 1 otherwise. This agrees with our reasoning for making opposite values as far as possible. Moreover, some sort of normalization (Han & Kamber, 2001) is used for age values to ensure that the values fall within the same range of GIM range, i.e. [0, 5]. Each age value is multiplied by (5/80), where age 80 is assumed to be the maximum possible age. The system picks the movies, from the test ratings set of the active user, one by one. Thereafter predicts ratings for them using formula (4) over the set of all neighbors who have rated the same movie. After getting the predicted ratings, the system compares them with the actual ratings given by the active user. Figs. 4 and 5 show the correct prediction percentage obtained from PRS and FRS for the fifty active users of split-1 (best split) and split-3 (worst split), respectively. Each graph shows the percentage of the number of ratings that the system predicted correctly out of the total number of available test ratings by the active user. 5.1.1. Analysis of the results The implementation of PRS for 50 active users took around 13.16 min, while it was around 2.13 min only for FRS for the same set of active users. The computational time complexity is reduced by approximately a factor of six with the proposed user model. Results summarized in 0 10 20 30 40 50 60 70 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Active User % of correct predictions PRS FRS FGRS Fig. 4. Correct predictions percentage for active users of split-1. M.Y.H. Al-Shamri, K.K. Bharadwaj / Expert Systems with Applications 35 (2008) 1386–1399 1395