Constructing User Profiles for Collaborative Rec Qing li and Byeong Man Kim Department of Computer Science diqing, bmkim/@sekumohackr Abstract. With the development of e-commerce and information ac- recommender systems have become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. In this paper, clustering technique is applied in the collaborative recommender framework to con- sider semantic contents available from the user profiles. We also suggest methods to construct user profiles from rating information and attributes fitems to accommodate user preferences. Further, we show that the cor- ect application of the semantic content information obtained from user profiles does enhance the effectiveness of collaborative recommendation 1 Introduction Recommender systems produce individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options. Such systems have an obvious appeal in an environment where the amount of on-line information vastly outstrips any individual's capability to survey it. Recommender systems are now an integral partofsomee-commercesitessuchasAmazoncomcdNowandLevis.com. Recommender systems are characterized with"individualized"that separate them from search engines, which focus on the"matching": the system is supposed to return all those items that match the query ranked by degree of match At the initial state, many recommender systems were fairly simple query-based nformation retrieval systems, which can be called as content-based recommender systems. Later, GroupLens 15 and Ringo [18 developed independently, were the first to automatic prediction. Collaborative recommender systems collect ratings of items from many individuals and make recommendations based on those ratings to a given user. Such as MovieLens system recommends movies nd GAB recommends web pages based on the bookmarks 19 Although collaborative filtering has been very successful in both research and practice areas, it can not recommend new items to users without any history in the system and completely denies any information that can be extracted from This work was supported by grant No. 2000-1-51200-008-2 from the Korea Science and Engineering Foundation
Constructing User Profiles for Collaborative Recommender System Qing Li and Byeong Man Kim Department of Computer Science Kumoh National Institute of Technology, Korea {liqing,bmkim}@se.kumoh.ac.kr Abstract. With the development of e-commerce and information access, recommender systems have become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. In this paper1 , clustering technique is applied in the collaborative recommender framework to consider semantic contents available from the user profiles. We also suggest methods to construct user profiles from rating information and attributes of items to accommodate user preferences. Further, we show that the correct application of the semantic content information obtained from user profiles does enhance the effectiveness of collaborative recommendation. 1 Introduction Recommender systems produce individualized recommendations as output or has the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options. Such systems have an obvious appeal in an environment where the amount of on-line information vastly outstrips any individual’s capability to survey it. Recommender systems are now an integral part of some e-commerce sites such as Amazon.com, CDNow and Levis.com. Recommender systems are characterized with “individualized” that separate them from search engines, which focus on the “matching”: the system is supposed to return all those items that match the query ranked by degree of match. At the initial state, many recommender systems were fairly simple query-based information retrieval systems, which can be called as content-based recommender systems. Later, GroupLens [15] and Ringo [18] developed independently, were the first to automatic prediction. Collaborative recommender systems collect ratings of items from many individuals and make recommendations based on those ratings to a given user. Such as MovieLens system recommends movies and GAB recommends web pages based on the bookmarks [19]. Although collaborative filtering has been very successful in both research and practice areas, it can not recommend new items to users without any history in the system and completely denies any information that can be extracted from 1 This work was supported by grant No. 2000-1-51200-008-2 from the Korea Science and Engineering Foundation
Qing Li and Byeong Man Kim semantic contents of items. For this reason, hybrid recommender systems have been provided, which can exploit both user preferences and semantic contents Proposed approaches to the hybrid system which combine content-based and collaborative filters together can be categorized into three groups. The first one is the linear combination of results of collaborative and content-based filtering such as systems that are described by Claypool 5. The second group is the sequential combination of content-based filtering and collaborative filtering. In these systems, firstly, content-based filtering algorithm is applied to find users ho share similar interests. Secondly, collaborative algorithm is applied to make predictions, such as RAAP 7 and Fab filtering systems 1. The last one is the mixed combination, both the semantic contents and ratings are applied to mal commendations, such as the probabilistic model [13 and Ripper system fo recommendation 2 In this paper we describe a web-based movie recommender, which can rec- ommend the movies based on not only the user ratings but also the significant amount of ot her information that is often available about the nature of each user The application of clustering technique in our approach provides a way for intro- ducing the semantic contents of user profiles into collaborative recommendation In the following Sections, we describe our approach in detail 2 Our approach n our approach, we integrate the information of user profiles and user ratings to calculate the user-user similarity From the user profiles, we get the feature-based information of users(liked movies with Meg Ryan with average score 2.45), while, from the user ratings, we get the atomic-item-based rating information(liked this particular movie with score 3). In Fab system, feature-based user profiles eplace the atomic-item-based rating information. It help alleviate sparsity, but the abstraction does lose information. However, in our approach we use both kinds of information- those based on ratings of individuals and those based on features. The Figure 1 shows the overview of our approach Groue Fig. 1. Overview of our approach 2.1 User profile creation User profiles indicate the information needs or preferences on items that users are interested in. A user profile consists of five profile vectors and each profile
2 Qing Li and Byeong Man Kim semantic contents of items. For this reason, hybrid recommender systems have been provided, which can exploit both user preferences and semantic contents. Proposed approaches to the hybrid system which combine content-based and collaborative filters together can be categorized into three groups. The first one is the linear combination of results of collaborative and content-based filtering, such as systems that are described by Claypool [5]. The second group is the sequential combination of content-based filtering and collaborative filtering. In these systems, firstly, content-based filtering algorithm is applied to find users, who share similar interests. Secondly, collaborative algorithm is applied to make predictions, such as RAAP [7] and Fab filtering systems [1]. The last one is the mixed combination, both the semantic contents and ratings are applied to make recommendations, such as the probabilistic model [13] and Ripper system for recommendation [2]. In this paper we describe a web-based movie recommender, which can recommend the movies based on not only the user ratings but also the significant amount of other information that is often available about the nature of each user. The application of clustering technique in our approach provides a way for introducing the semantic contents of user profiles into collaborative recommendation. In the following Sections, we describe our approach in detail. 2 Our approach In our approach, we integrate the information of user profiles and user ratings to calculate the user-user similarity. From the user profiles, we get the feature-based information of users(liked movies with Meg Ryan with average score 2.45),while, from the user ratings, we get the atomic-item-based rating information(liked this particular movie with score 3). In Fab system, feature-based user profiles replace the atomic-item-based rating information. It help alleviate sparsity, but the abstraction does lose information. However, in our approach we use both kinds of information - those based on ratings of individuals and those based on features. The Figure 1 shows the overview of our approach. Rating Data + Group Rating Matrix Group Rater Clustering Engine Collaborative Filter User Rating Matrix Profile Group Vector User Profiles Fig. 1. Overview of our approach 2.1 User profile creation User profiles indicate the information needs or preferences on items that users are interested in. A user profile consists of five profile vectors and each profile
Constructing User Profiles for Collaborative Recommender System vector represents an aspect (or dimension of his preferences). In our current ystem, five aspects-actor, actress, director, genre and synopsis- have been considered. A profile vector consists of several attribute- value pairs as Figure hows To reduce the burden of users the user can indicate interesting movies instead of specifying his preference explicitly; the profile vectors can be automatically constructed from movie description data by the following simple equation Numitem C attribute of aspect, I item >threshold here Numitem threshold is the number of items, in which ranking value is ager than the threshold; Numitem C attribute of aspect, is the number of items containing the attribute m in the aspect n of user profile and its rating is larger than the threshold. In our present experiment, we set the value of the threshold as 3. For example, if user Tom make ratings on three movie- You'u Got Mail, Sleepless in Seattle, and Swordfish. among them, only the ratings of You'v Got Mail and Sleepless in Seattle are larger than the threshold 3. From the genre information, we know that You'v Got Mail and Sleepless in Seattle belong to the love genre. So the genre aspect of Tom's profile is as Tom (love(2/2)) Hanks Fig. 2. User Profiles 2.2 Construction of user profiles through Fuzzy Inference The characteristics of the synopsis aspect are a little different from other aspects So, we take a different approach. a weighted keyword vector is constructed for each user. Firstly, keywords are extracted from synopsis field of movie descrip- tion record, which user shows interests on. Secondly, calculate the weight of each keyword. Although tf x idf formula is popularly used in information retrieval literature [16 for calculating the keyword weight, we adopt the method of cal- culating weight through fuzzy inference from our previous work 3, which sho more effectiveness In order to construct the user profiles based on the synopsis, firstly, the synopses of user preferred movies are transformed into a set of candidate term through eliminating stopwords and stemming by Porter's algorithm. The DTF, DF, and idF of each term are calculated based on this set and used as inputs of
Constructing User Profiles for Collaborative Recommender System 3 vector represents an aspect (or dimension of his preferences). In our current system, five aspects - actor, actress, director, genre and synopsis - have been considered. A profile vector consists of several attribute-value pairs as Figure 2 shows. To reduce the burden of users, the user can indicate interesting movies instead of specifying his preference explicitly; the profile vectors can be automatically constructed from movie description data by the following simple equation. Wn,m = Numitem ⊆ attributem of aspectn | item > threshold Numitem > threshold (1) where Numitem > threshold is the number of items, in which ranking value is lager than the threshold; Numitem ⊆ attributem of aspectn is the number of items containing the attribute m in the aspect n of user profile and its rating is larger than the threshold. In our present experiment, we set the value of the threshold as 3. For example, if user Tom make ratings on three movie - You’v Got Mail, Sleepless in Seattle, and Swordfish. Among them, only the ratings of You’v Got Mail and Sleepless in Seattle are larger than the threshold 3. From the genre information, we know that You’v Got Mail and Sleepless in Seattle belong to the love genre. So the genre aspect of Tom’s profile is as Tom {love (2/2)}. Assumption Total Movie No. : 3 Item Rating of Tom: Sleepless in Seattle 5 You’ve got mail 5 Swordfish 2 Movie ID:3 Title: Swordfish Actor: John Travolta Actress: Halle Berry Director: Sena Genre: Action Synopsis: Terrorist ransom Movie ID:2 Title: Sleepless in Seattle Actor: Tom Hanks Actress: Meg Ryan Director: Norman Rene Genre: Love Synopsis: lover in Seattle… Movie ID:1 Title: You’ve Got Mail Actor: Tom Hanks Actress: Meg Ryan Director: Nora Genre: Love Synopsis: Lovers meet online … User Profile: Tom Actor: (Tom Hanks, 1.0) Actress: (Meg Ryan, 1.0) Director:((Norman, 0.5),(Nora,0.5)) Genre: (lover, 1.0) Synopsis: (lover (0.18), meet (0.48), online (0.48)…) Fig. 2. User Profiles 2.2 Construction of user profiles through Fuzzy Inference The characteristics of the synopsis aspect are a little different from other aspects. So, we take a different approach. A weighted keyword vector is constructed for each user. Firstly, keywords are extracted from synopsis field of movie description record, which user shows interests on. Secondly, calculate the weight of each keyword. Although tf × idf formula is popularly used in information retrieval literature [16] for calculating the keyword weight, we adopt the method of calculating weight through fuzzy inference from our previous work [3], which shows more effectiveness. In order to construct the user profiles based on the synopsis, firstly, the synopses of user preferred movies are transformed into a set of candidate terms through eliminating stopwords and stemming by Porter’s algorithm. The DTF, DF, and IDF of each term are calculated based on this set and used as inputs of
Qing Li and Byeong Man Kim fuzzy inference. The DTF(Distributed Term Frequency) reflects the frequency nd distributed status of a term in a set of documents(user preferred movie synopses), which is calculated by dividing total occurrences of the term in a set of documents by the number of documents in the set containing the term. It needs to be normalized for being used in fuzzy inference. The following shows the normalized distributed term frequency (NDt NDTF There, T Fi is the frequency of term t; in the example documents(user preferred movie synopses ), DFi is the number of documents having term t; in the example documents The DF(Document Frequency) represents the frequency of documents having specific term the example documents Like DtF, Df also provides one measure of how well that term describes the contents of document set. The normalized document frequency, NDF, is defined in Equation 3, where DFi is the number of documents having term ti in the example documents DE NDF may The IDF(Inverse Document Frequency)represents the inverse document fre- quency of a specific term over an entire document collection(all movie synopses the database) not example documents. The motivation for usage of IDF factor that terms which appear in many documents are not very useful for distin- guishing a relevant document from a non-relevant one. The normalized inverse document frequency, NidF, is defined as follows NIDF IDE IDF=l ma x,IDF where, N is the total number of documents and ni is the number of documents in which term ti appears Figure 3 shows the membership functions of the input/output variables -3 inputs(NDTF, NDF, NIDF)and 1 output(TW)-used in our method. As you can see in Figure 3(a), NDTF variable has I S(Small), L(Large)1, and NDF ind NIDF variables have[ S(Small), M(Middle), L(Large)) as linguistic labels (or terms). The fuzzy output variable, TW (Term Weight) which represents the importance of a term, has six linguistic labels as shown in Figure 3(b) The 18 fuzzy rules are involved to infer the term weight(TW). The rules are constructed based on the intuition that the important or representative terms may occur across many positive example documents(user preferred movies)but not in general documents, i. e, their NDF and NIDF are very high. As shown in Figure 4, the Tw of a term is Z in most cases regardless of its NDF and NDTF
4 Qing Li and Byeong Man Kim fuzzy inference. The DTF (Distributed Term Frequency) reflects the frequency and distributed status of a term in a set of documents (user preferred movie synopses), which is calculated by dividing total occurrences of the term in a set of documents by the number of documents in the set containing the term. It needs to be normalized for being used in fuzzy inference. The following shows the normalized distributed term frequency (NDTF). NDT Fi = T Fi DFi maxj h T Fj DFj i (2) where, T Fi is the frequency of term ti in the example documents(user preferred movie synopses), DFi is the number of documents having term ti in the example documents. The DF (Document Frequency) represents the frequency of documents having a specific term within the example documents. Like DTF, DF also provides one measure of how well that term describes the contents of document set. The normalized document frequency, NDF, is defined in Equation 3, where DFi is the number of documents having term ti in the example documents. NDFi = DFi maxj DFj (3) The IDF (Inverse Document Frequency) represents the inverse document frequency of a specific term over an entire document collection(all movie synopses in the database) not example documents. The motivation for usage of IDF factor is that terms which appear in many documents are not very useful for distinguishing a relevant document from a non-relevant one. The normalized inverse document frequency, NIDF, is defined as follows: NIDFi = IDFi maxj IDFj , IDFi = log N ni (4) where, N is the total number of documents and ni is the number of documents in which term ti appears. Figure 3 shows the membership functions of the input/output variables - 3 inputs (NDTF, NDF, NIDF) and 1 output (TW) - used in our method. As you can see in Figure 3(a), NDTF variable has { S(Small), L(Large) }, and NDF and NIDF variables have { S(Small), M(Middle), L(Large) } as linguistic labels (or terms). The fuzzy output variable, TW (Term Weight) which represents the importance of a term, has six linguistic labels as shown in Figure 3(b). The 18 fuzzy rules are involved to infer the term weight (TW). The rules are constructed based on the intuition that the important or representative terms may occur across many positive example documents (user preferred movies) but not in general documents, i.e., their NDF and NIDF are very high. As shown in Figure 4, the TW of a term is Z in most cases regardless of its NDF and NDTF
Constructing User Profiles for Collaborative Recommender System H11I N⊥1 Fig 4. Fuzzy inference rules if its nidf is s. because such term equently thus its NdF and ndtf can be high. When ndF of a term is high and its NIDF is also high, the term is considered as a representative keyword and then the output value is between X and xX. The other rules were set similarl We can get the term weight Tw through the following procedure. But, the output is in the form of fuzzy set and thus has to be converted to the crisp value. 16.paper, the center of gravity method(COG)is used to defuzzify the output 1. Apply the NDTF, NDF, and NidF fuzzy values to the antecedent portions of 18 fuzzy rules. 2. Find the minimum value among the membership degrees of three input fuzzy 3. Classify every 18-membership degree into 6 groups according to the fuzzy output variable TW. 4. Calculate the maximum output value for each group and then generate 6 For instance, let us assume, there is one term, which NIdF is 0.35, NDF 0.2 and NDTF is 0.3. The degree of membership is determined by plugging the selected input parameter(NIDF, NDF or NDTF) into the horizontal axis and projecting vertically to the upper boundary of the membership function(s) in Figure 3. So the result is as follows NIDF=0.35:S=0.00,M=0.57; NDF=0.20:S=0.43,M=0.14 NDTF=0.30:S=0.53,L=0.07 Now referring back to the rules, only 8 rules out of 18 rules need to be lected. which are marked in Figure 4. The effective rules are listed as follows
Constructing User Profiles for Collaborative Recommender System 5 of documents, which is calculated by dividing total occurrences of the term in a set of documents by the number of documents in the set containing the term. It needs to be normalized for being used in fuzzy inference. The following shows the normalized term frequency (NTF). 0 0.2 0.4 0.6 0.8 1 Z S M L X XX 1 TW 1 0.25 0.75 1 L S L 1 0.15 0.35 0.65 0.85 1 M 1 NDTF NDF, NIDF 1 (a) Input variable (b) Output variable Z: zero S:small M: middle L : large X: x large XX:xx larger S reweighting Degree of Membership Degree of Membership Fig. 3. Fuzzy variables inverse document frequency of a specific term over an entire document collection not example documents. The normalized inverse document frequency, NIDF, is defined is the Table 1. Fuzzy inference rules NIDF NDF S M L NIDF NDF S M L S Z Z S S Z S M M Z M L M Z L X L S L X L S X XX NDTF = S NDTF = L Fig. 4. Fuzzy inference rules if its NIDF is S, because such term may occur frequently in any document and thus its NDF and NDTF can be high. When NDF of a term is high and its NIDF is also high, the term is considered as a representative keyword and then the output value is between X and XX. The other rules were set similarly. We can get the term weight TW through the following procedure. But, the output is in the form of fuzzy set and thus has to be converted to the crisp value. In this paper, the center of gravity method(COG) is used to defuzzify the output [6]. 1. Apply the NDTF, NDF, and NIDF fuzzy values to the antecedent portions of 18 fuzzy rules. 2. Find the minimum value among the membership degrees of three input fuzzy values. 3. Classify every 18-membership degree into 6 groups according to the fuzzy output variable TW. 4. Calculate the maximum output value for each group and then generate 6 output values. For instance, let us assume, there is one term, which NIDF is 0.35, NDF is 0.2 and NDTF is 0.3. The degree of membership is determined by plugging the selected input parameter(NIDF, NDF or NDTF) into the horizontal axis and projecting vertically to the upper boundary of the membership function(s) in Figure 3. So the result is as follows. NIDF=0.35 : S=0.00, M=0.57; NDF=0.20 : S=0.43, M=0.14; NDTF=0.30 : S=0.53, L=0.07. Now referring back to the rules, only 8 rules out of 18 rules need to be selected, which are marked in Figure 4. The effective rules are listed as follows
Qing Li and Byeong Man Kim 1. If(NIDF=S, NDF=S, NDTF=S)then TW=Z min S=0.00, $=0.43, $=0.53=0.00 2. If(NIDF=S, NDF=M, NDTF=S)then TW=Z min S=0.00, M=0. 14, S=0.53=0.00 3. If(NIDF=M, NDF=S, NDTF=S)then TW=Z minM=0.57, $=0.43, S=0.53=0.43 4. If(NIDF=M, NDF=M,NDTF=S)then TW=Mmin M=0.57, M=0. 14, $=0.53=0.14 5. If(NIDF=S, NDF=S, NDTF=L) then TW=Z min S=0.00, S=0.43, L=0.07=0.00 6. If(NIDF=S, NDF=M, NDTF=L) then TW=Z minS=0.00, M=0.14,L=0.071=0.00 7. If(NIDF=M, NDF=S, NDTF=L)then TW=S min M=0.57, S=0.43, L=0.07)=0.07 8. If(NIDF=M, NDF=M,NDTF=L)then TW=L min(M=0.57, M=0. 14, L=0.07)=0.0 Then we calculate the maximum output value for each group and then ger erate 6 output values as Figure 5 shows, which consists a fuzzy set of Tw as follows TW={Z/0.43.S/0.07,M/0.14,L/0.07,X/0,XX/0} At last, the COG is used to defuzzify the output into one value TW=1181+2=016 Fig. 5. Defuzzify the Output After representing the user profiles, the next task for us is to group the user pro- files to form group-rating matrix Since the result of clustering aims at applying to build the relationship among all users, we build cliques over all users instead of the nearest neighbor between all possible pairs K-means Clustering Algorithm is a simple and fast clustering method, which has been popularly used 9. So we apply it to group the users with some ad justments. The difference is that we apply the fuzzy set theory to represent the affiliation between an object and a cluster. Firstly, user profiles are grouped into a given number of clusters. After completion of grouping, the possibility of one object(here one object means one user profile) belonging to a certain cluster calculated as follows Pro(, k)=l- ICS(i, k) where Pro(, k) means the possibility of object j belonging to the cluster A The CS(, k) means the counter-similarity between the object j and the cluster k
6 Qing Li and Byeong Man Kim 1. If(NIDF=S,NDF=S,NDTF=S) then TW=Z min{S=0.00,S=0.43,S=0.53}=0.00 2. If(NIDF=S,NDF=M,NDTF=S) then TW=Z min{S=0.00,M=0.14,S=0.53}=0.00 3. If(NIDF=M,NDF=S,NDTF=S) then TW=Z min{M=0.57,S=0.43,S=0.53}=0.43 4. If(NIDF=M,NDF=M,NDTF=S) then TW=M min{M=0.57,M=0.14,S=0.53}=0.14 5. If(NIDF=S,NDF=S,NDTF=L) then TW=Z min{S=0.00,S=0.43,L=0.07}=0.00 6. If(NIDF=S,NDF=M,NDTF=L) then TW=Z min{S=0.00,M=0.14,L=0.07}=0.00 7. If(NIDF=M,NDF=S,NDTF=L) then TW=S min{M=0.57,S=0.43,L=0.07}=0.07 8. If(NIDF=M,NDF=M,NDTF=L) then TW=L min{M=0.57,M=0.14,L=0.07}=0.07 Then we calculate the maximum output value for each group and then generate 6 output values as Figure 5 shows, which consists a fuzzy set of TW as follows. TW={Z/0.43, S/0.07, M/0.14, L/0.07, X/0, XX/0} At last, the COG is used to defuzzify the output into one value. TW = 0.43×0.1+0.07×0.2+0.14×0.4+0.07×0.7 0.1+0.2+0.4+0.7 = 0.116 0.07 0.14 0.07 0 0.2 0.4 0.6 0.8 1 Z S M L X XX 1 0.43 0.1 0.7 Fig. 5. Defuzzify the Output 2.3 Grouping ratings After representing the user profiles, the next task for us is to group the user pro- files to form group-rating matrix. Since the result of clustering aims at applying to build the relationship among all users, we build cliques over all users instead of the nearest neighbor between all possible pairs. K-means Clustering Algorithm is a simple and fast clustering method, which has been popularly used [9]. So we apply it to group the users with some adjustments. The difference is that we apply the fuzzy set theory to represent the affiliation between an object and a cluster. Firstly, user profiles are grouped into a given number of clusters. After completion of grouping, the possibility of one object (here one object means one user profile) belonging to a certain cluster is calculated as follows. P ro(j, k) = 1 − CS(j, k) M axCS(i, k) (5) where P ro(j, k) means the possibility of object j belonging to the cluster k; The CS(j, k) means the counter-similarity between the object j and the cluster k
Constructing User Profiles for Collaborative Recommender System which is calculated based on the Euclidean distance; MarCS(i, k)means the maximum counter-similarity between an object and the cluster k However, as for clustering algorithms, how to choose the initial cluste er centen is a critical problem. We recommend the refinement algorithm suggested by Bradley 4 2.4 Similarity computation and prediction As we can see, after grouping the user profiles, we get a new rating matrix. V can use the user-based collaborative algorithm to calculate the similarity and make the predictions for users In our approach, we use the Pearson correlation- based algorithm [11 to calculate the user-user similarity from user-rating matrix, then calculate the user-user similarity from group-rating matrix by the adjusted cosine algorithm [17, at last, the total user similarity is the linear combination of the above two as like in [14. Prediction for an item is then computed by performing a weighted average of deviations from the neighbors mean [15 3 Experimental evaluation Currently, we perform experiments on a real movie rating data collected from the MovieLens web-based recommendation system. The data set contained 100,000 ratings from 943 users and 1, 682 movies, with each user rating at least 20 items The ratings in the MovieLens data were explicitly entered by users, and are ntegers ranging from 1 to 5. We divide data set into a training set and a test data set. 20% of users are randomly selected to be the test users, and 80% of users are randomly selected to be the training set MAE(Mean Absolute Error)has widely been used in evaluating the curacy of a recommender system. The MAE is calculated by summing these absolute errors of the corresponding rating-prediction pairs and then computing the average In the MovieLens data, there are no user preferences and only genre infor- mation of movies. Therefore. we collect the movie semantic information that is genre, actor, actress, director and Synopsis, from Internet Movie Database (http://www.imdb.com)toconstructtheuserprofilesfromthetrainingdata set using the method in Section 2.1 and 2.2 3.1 Behaviors of our method Recall that, user profiles consist of terms with term weights from the semantic information of movies( genre, actor, actress, director, synopsis ) Since there are so many terms in the movie synopsis, the effectiveness of clustering method is sensitive to the number of terms selected for clustering. It can be observed from Figure 6 that when the number of selected terms is about 50, it shows a good
Constructing User Profiles for Collaborative Recommender System 7 which is calculated based on the Euclidean distance; M axCS(i, k) means the maximum counter-similarity between an object and the cluster k. However, as for clustering algorithms, how to choose the initial cluster center is a critical problem. We recommend the refinement algorithm suggested by Bradley [4]. 2.4 Similarity computation and prediction As we can see, after grouping the user profiles, we get a new rating matrix. We can use the user-based collaborative algorithm to calculate the similarity and make the predictions for users. In our approach, we use the Pearson correlationbased algorithm [11] to calculate the user-user similarity from user-rating matrix, then calculate the user-user similarity from group-rating matrix by the adjusted cosine algorithm [17], at last, the total user similarity is the linear combination of the above two as like in [14]. Prediction for an item is then computed by performing a weighted average of deviations from the neighbor’s mean [15]. 3 Experimental evaluation Currently, we perform experiments on a real movie rating data collected from the MovieLens web-based recommendation system. The data set contained 100,000 ratings from 943 users and 1,682 movies, with each user rating at least 20 items. The ratings in the MovieLens data were explicitly entered by users, and are integers ranging from 1 to 5. We divide data set into a training set and a test data set. 20% of users are randomly selected to be the test users, and 80% of users are randomly selected to be the training set. MAE (Mean Absolute Error) has widely been used in evaluating the accuracy of a recommender system. The MAE is calculated by summing these absolute errors of the corresponding rating-prediction pairs and then computing the average. In the MovieLens data, there are no user preferences and only genre information of movies. Therefore, we collect the movie semantic information, that is genre, actor, actress, director and Synopsis, from Internet Movie Database (http: //www.imdb.com ) to construct the user profiles from the training data set using the method in Section 2.1 and 2.2. 3.1 Behaviors of Our Method Recall that, user profiles consist of terms with term weights from the semantic information of movies ( genre, actor, actress, director, synopsis ). Since there are so many terms in the movie synopsis, the effectiveness of clustering method is sensitive to the number of terms selected for clustering. It can be observed from Figure 6 that when the number of selected terms is about 50, it shows a good performance
Qing Li and Byeong Man Kim As we discussed in Section 2. 2. in order to enhance the performance, we calculate the term weights through fuzzy inference rule, instead of the simple tf x idf method [16. As Figure 6 shows, our suggest method does Fig. 6. Terms of Synopsis Comparing with the classic collaborative filtering algorithm which does not use the group-rating matrix and only applies Pearson correlation-based alge rithm to calculate the similarity based on user-rating matrix, our method shows better performance than the classic collaborative filtering algorithm as Figure 7 shows. From our similarity computation, we adopted two ways. We call the first EUCHM, which enlarges the value scale of group-rating matrix from 0 1] to 1 51, and use the Pearson correlation-based algorithm to calculate the similarit The second called CUCHM, in which the Pearson correlation-based algorithm is pplied to calculate the similarity from user-rating matrix, and then the adjusted cosine algorithm is applied to calculate the similarity from group-rating matrix, at last the total similarity is the linear combination of them. It can be observed that the performance of CUCHM is the best, and the second is EUCHM, which is followed by the classic collaborative filtering Algorithm Fig. 7. Comparison In order to find the optimal combination coefficient, we implemented a series of tests changing the value of combination coefficient from 0 to 1 with a con stant gap 0. 1. Figure 8 shows that when the coefficient arrives at 0.4, an optimal recommendation performance is achieved In order to observe the contribution of each content information e con- truct user profiles by movie synopsis and genre respectively. In our experiment
8 Qing Li and Byeong Man Kim As we discussed in Section 2.2, in order to enhance the recommendation performance, we calculate the term weights through fuzzy inference rule, instead of the simple tf × idf method [16]. As Figure 6 shows, our suggest method does improve the performance of recommendation. 0.73 0.735 0.74 0.745 0.75 10 20 30 40 50 60 80 100 All NO.of terms MAE Fuzzy Method tf*idf Fig. 6. Terms of Synopsis Comparing with the classic collaborative filtering algorithm which does not use the group-rating matrix and only applies Pearson correlation-based algorithm to calculate the similarity based on user-rating matrix, our method shows better performance than the classic collaborative filtering algorithm as Figure 7 shows. From our similarity computation, we adopted two ways. We call the first EUCHM, which enlarges the value scale of group-rating matrix from [0 1] to [1 5], and use the Pearson correlation-based algorithm to calculate the similarity. The second called CUCHM, in which the Pearson correlation-based algorithm is applied to calculate the similarity from user-rating matrix, and then the adjusted cosine algorithm is applied to calculate the similarity from group-rating matrix, at last the total similarity is the linear combination of them. It can be observed that the performance of CUCHM is the best, and the second is EUCHM, which is followed by the classic collaborative filtering Algorithm. 0.74 0.745 0.75 0.755 0.76 0.765 0.77 0.775 0.78 10 20 30 40 50 60 70 80 90 No. of neighbors MAE CUCHM Conventional Pearson EUCHM Fig. 7. Comparison In order to find the optimal combination coefficient, we implemented a series of tests changing the value of combination coefficient from 0 to 1 with a constant gap 0.1.Figure 8 shows that when the coefficient arrives at 0.4, an optimal recommendation performance is achieved. In order to observe the contribution of each content information. We construct user profiles by movie synopsis and genre respectively. In our experiment
Constructing User Profiles for Collaborative Recommender System 9 0.745 0.74 00.10.2。.30.40.50.60.7D.B0.91 Fig 8. Combination Coefficient we found that the semantic information extracted from synopsis contributed greatly to the recommendation performance, as Figure 9 shows. o.7之 SYnopE主 Fig. 9. Comparison Although some hybrid recommender systems have already exited, it is hard to make an evaluation among them. Some systems 7 use Boolean value(rele- vant or irrelevant) to represent user preferences, while others use numeric value. The same evaluation metrics cannot make a fair comparison. Further more, the quality of some systems depends on the time, in which system parameters are changed with user feedback 5, and Claypool does not clearly describe how to change the weight with time passed. Ripper [2 works best when asked to make binary decisions, so it is unfair to compare it with our approach. Popescul [13 extends aspect model to take contents of items into account. However, it is also based on the binary decisions of users. However, we can make a simple concept comparison. In Fab system [1, the similarity for prediction is only based on the user profiles, which is a special case of our approach that the combination coefficient is 1. As the Figure shows, we can conclude our approach shows a better performance than the Fab system 4 Conclusion In this paper, we describe our movie recommender system, provide our new approach for movie recommendation and evaluate it via experiment on a large realistic set of ratings. Since this mechanism is not limited to the movie domain it can be extended to other domain such as CD. music and books
Constructing User Profiles for Collaborative Recommender System 9 0.74 0.745 0.75 0.755 0.76 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Combination Coefficient MAE Fig. 8. Combination Coefficient we found that the semantic information extracted from synopsis contributed greatly to the recommendation performance, as Figure 9 shows. 0.72 0.73 0.74 0.75 0.76 MAE Synopsis Genre Fig. 9. Comparison Although some hybrid recommender systems have already exited, it is hard to make an evaluation among them. Some systems [7] use Boolean value (relevant or irrelevant) to represent user preferences, while others use numeric value. The same evaluation metrics cannot make a fair comparison. Further more, the quality of some systems depends on the time, in which system parameters are changed with user feedback [5], and Claypool does not clearly describe how to change the weight with time passed. Ripper [2] works best when asked to make binary decisions, so it is unfair to compare it with our approach. Popescul [13] extends aspect model to take contents of items into account. However, it is also based on the binary decisions of users. However, we can make a simple concept comparison . In Fab system [1], the similarity for prediction is only based on the user profiles, which is a special case of our approach that the combination coefficient is 1. As the Figure 8 shows, we can conclude our approach shows a better performance than the Fab system. 4 Conclusion In this paper, we describe our movie recommender system, provide our new approach for movie recommendation and evaluate it via experiment on a large, realistic set of ratings. Since this mechanism is not limited to the movie domain, it can be extended to other domain, such as CD, music and books
Qing Li and Byeong Man Kim Re ererences 1. Balabanovic, M. and Shoham, Y. Fab: Content-Based, Collaborative Recommen- dation, Communications of the ACM, 40(3), pp. 66-72.(1997) 2. Basu C. and Cohen. Using Socail and Content-based information in Recommenda tion. In Proc. of the AAAl-98(1998) 3. Byeong Man Kim, Qing Li, Jong-wan Kim. Extraction of user preferences from a few itive documents. IRAL-2003, workshop of ACL-2003, Sapporo, Japan(2003) 4. Bradley, P.S. and Fayyad U M. Refining Initial Points for K-Means Clustering. In Proc. of ICML 98. San Francisco, USA, pages 91-99(1998) 5. Claypool, M., Gokhale, A, Miranda, T, Murnikov, P, Netes, D. and Sartin, M. Combining content-based and collaborative filters in an online newspaper, In Proc. ACM-SIGIR Workshop on Recommender Systems: Algorithms and Evalu- ation(1999) 6. C.C. Lee. Fuzzy logic in control systems: fuzzy logic controller-Part I, IEEE Trans On Systems, Man, and Cybernetics, 20(2),pp408-418(1990) 7. Delgado, J, Ishii, N. and Ura, T. Content-based Collaborative Information Filter Actively Learning to Classify and Recommend Documents, In Proc. Second Int Workshop, CIA, 98, pp. 206-215(1998) 8. Duda, Richard O, Hart, Peter E. and Stork, David G. Pattern Classfication, wile Interscience Publication, New York, pp 528-530(2000 9. Han, J, and Kamber, M. Data mining: Concepts and Techniques. New York Morgan-Kaufman(2000) 10. Herlocker, J, Konstan, J, Borchers A, and Riedl, J. An algorithmic framework for performing collaborative Filtering, In Proc. ACM-SIGIR Conf, 1999, Pp. 230-237 (1999) 11. Mcclave, J. T. and Dietrich, F. H. Statistics. San Francisco: Ellen Publishing 12. Oard, D.W.and Marchionini, G. A conceptual framework for text filtering, Tech- nical Report EE.-TR-96-25, CAR-TR-830, CS-TR3643(1996) 13. Popescul, A, Ungar, L. H, Pennock, D. M. and Lawrence, S. Probabilistic Mod- els for Unified Collaborative and Content-Based Recommendation in Sparse-Data nvironments. UAI-2001. Seattle, WA, USA (2001) ering Approach for Hybrid Recommender System In Proc. of the 2003 IEEE/WIC International Conference on Web Intelligence (WI 2003),Pp.33-38(2003) 15. Resnick, P, Iacovou, N, Suchak, M, Bergstorm, P. and Riedl, J. GroupLens: An open architecture for collaborative filtering of Netnews, In Proc. ACM Conf. on Computer-Supported Cooperative Work. pp 175-186(1994) 16. Ricardo Baeza-Yates. Berthier Riberio- Neto. Modern Information Retrieval. New York: Addison-Wesley Publishers(1999 17. Sarwar, B M., Karypis, G, Konstan, J. A and Riedl, J. Item-based Collaborative Filtering Recommendation Algorithms, In Proc. Tenth Int. Www Conf. 2001, pp 285-295(2001) 18. Shardanand U. and Maes. Social information filtering: Algorithms for automating word of mouth", Proceedings of CHl95-Human Factors in Computing Systems, pp210-217(1995) 19. Wittenburg, K, Das, D, Hill, w. and Stead, L. Group Asynchronous Browsing on the World wide Web. Proc. of Fourth International World Wide Web Conference, Boston, Massachusetts, USA, pp 51-62(1995)
10 Qing Li and Byeong Man Kim References 1. Balabanovic, M. and Shoham, Y.. Fab: Content-Based, Collaborative Recommendation, Communications of the ACM, 40(3), pp.66-72. (1997) 2. Basu C., and Cohen.Using Socail and Content-based information in Recommendation. In Proc. of the AAAI-98 (1998). 3. Byeong Man Kim, Qing Li, Jong-wan Kim. Extraction of user preferences from a few positive documents. IRAL-2003, workshop of ACL-2003, Sapporo, Japan (2003). 4. Bradley, P.S. and Fayyad U.M..Refining Initial Points for K-Means Clustering. In Proc. of ICML ’98. San Francisco, USA, pages 91–99 (1998). 5. Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D. and Sartin, M.. Combining content-based and collaborative filters in an online newspaper , In Proc. ACM-SIGIR Workshop on Recommender Systems: Algorithms and Evaluation (1999). 6. C.C. Lee. Fuzzy logic in control systems: fuzzy logic controller-part I , IEEE Trans. On Systems, Man, and Cybernetics, 20 (2) ,pp408-418(1990). 7. Delgado, J., Ishii, N. and Ura, T.. Content-based Collaborative Information Filtering: Actively Learning to Classify and Recommend Documents, In Proc. Second Int. Workshop, CIA’98, pp.206-215 (1998). 8. Duda, Richard O., Hart,Peter E. and Stork,David G.. Pattern Classfication, WileyInterscience Publication, New York, pp.528-530 (2000). 9. Han, J., and Kamber, M.. Data mining: Concepts and Techniques. New York: Morgan-Kaufman (2000). 10. Herlocker, J., Konstan, J., Borchers A., and Riedl, J.. An algorithmic framework for performing collaborative Filtering, In Proc. ACM-SIGIR Conf., 1999, pp. 230-237 (1999). 11. McClave, J. T. and Dietrich, F. H.. Statistics. San Francisco: Ellen Publishing Company (1998). 12. Oard, D.W. and Marchionini, G.. A conceptual framework for text filtering, Technical Report EE-TR-96-25, CAR-TR-830, CS-TR3643 (1996). 13. Popescul, A., Ungar, L. H., Pennock, D. M. and Lawrence,S. Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments. UAI-2001. Seattle, WA, USA.(2001). 14. Qing Li, Byeong Man Kim. Clustering Approach for Hybrid Recommender System, In Proc. of the 2003 IEEE/WIC International Conference on Web Intelligence (WI 2003), pp. 33-38 (2003). 15. Resnick, P., Iacovou, N., Suchak, M., Bergstorm, P. and Riedl, J.. GroupLens: An open architecture for collaborative filtering of Netnews, In Proc. ACM Conf. on Computer-Supported Cooperative Work. pp.175-186 (1994). 16. Ricardo Baeza-Yates, Berthier Riberio-Neto. Modern Information Retrieval. New York:Addison-Wesley Publishers (1999). 17. Sarwar, B. M., Karypis, G., Konstan, J. A. and Riedl, J.. Item-based Collaborative Filtering Recommendation Algorithms, In Proc. Tenth Int. WWW Conf. 2001, pp. 285-295 (2001). 18. Shardanand U. and Maes. Social information filtering: Algorithms for automating ”word of mouth”, Proceedings of CHI’95 – Human Factors in Computing Systems, pp 210-217 (1995). 19. Wittenburg, K., Das, D., Hill, W. and Stead, L.. Group Asynchronous Browsing on the World Wide Web. Proc. of Fourth International World Wide Web Conference, Boston, Massachusetts, USA, pp 51-62 (1995)