DistributedandParallelDatabases,14,173-192,2003 C 2003 Kluwer Academic Publishers. Manufactured in The Netherlands An Adaptive Recommendation System without Explicit Acquisition of User Relevance Feedback CYRUS SHAHABI wahabi@usc.edu YI-SHIN CHE yishinc@ usc.edu Integrated Media Systems Center and Computer Science Department, University of Southern California. Los Angeles, CA 90089-2564, USA Recommended by: Boualem Benatallah opted in e-commerce businesses for helping customers locate products they would like to purchase. In an earlier work, we introduced a recommendation system, termed Yoda, hich employs a hybrid approach that combines collaborative filtering( CF)and content-based querying to achieve higher accuracy for large-scale Web-based applications. To reduce the complexity of the hybrid approach, Yoda is structured as a tunable model that is trained off-line and employed for real-time recommendation on-line. The h-line process benefits from an optimized aggregation function with low complexity that allows the real-time gregation based on confidence values of an active user to pre-defined sets of recommendations In this paper extend Yoda to include more recommendation sets. The recommendation sets can be obtained from differen sources, such as human experts, web navigation pattens, and clusters of user evaluations. Moreover, the extended Yoda can learn the confidence values automatically by utilizing implicit users'relevance feedback through web navigations using genetic algorithms( GA). Our end-to-end experiments show while Yoda's complexity is low and remains constant as the number of users and/or items grow, its accuracy surpasses that of the basic nearest-neighbor ethod by a wide margin(in most cases more than 100%0). The experimental results also indicate that the retrieval uracy is significantly increased by using the GA-based learning mechanism. Keywords: e-commerce, recommendation systems, genetic algorithm, relevance feedbacl 1. ntroduction As the amount of available products in e-commerce businesses is burgeoning, searching for desired products among enormous offerings is becoming increasingly difficult. As a result, e-commerce users frequently suffer from information overload. To alleviate this problem, recommendation systems are being widely adopted to help customers locate products they would like to purchase. In essence, these systems apply data analysis techniques to provide a list of recommended products for each online customer. The most famous example in e-commercebusinessesisthe"customerswhobought"featureusedinAmazon.com which is basically applied to every product page on its websites. With the help of th featuresAmazon.comssystemrecommendssimilarproductstothecurrentbuyerbased on the purchase histories of previous customers who bought the same product oh- l ozog arpa and usaa u bde at geats be 30602 39 4052 4 Rnd an es ited cas, ifts from Microsoft, NCR, and Okawa Foundations
Distributed and Parallel Databases, 14, 173–192, 2003 c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands. An Adaptive Recommendation System without Explicit Acquisition of User Relevance Feedback∗ CYRUS SHAHABI shahabi@usc.edu YI-SHIN CHEN yishinc@usc.edu Integrated Media Systems Center and Computer Science Department, University of Southern California, Los Angeles, CA 90089-2564, USA Recommended by: Boualem Benatallah Abstract. Recommendation systems are widely adopted in e-commerce businesses for helping customers locate products they would like to purchase. In an earlier work, we introduced a recommendation system, termed Yoda, which employs a hybrid approach that combines collaborative filtering (CF) and content-based querying to achieve higher accuracy for large-scale Web-based applications. To reduce the complexity of the hybrid approach, Yoda is structured as a tunable model that is trained off-line and employed for real-time recommendation on-line. The on-line process benefits from an optimized aggregation function with low complexity that allows the real-time aggregation based on confidence values of an active user to pre-defined sets of recommendations. In this paper, we extend Yoda to include more recommendation sets. The recommendation sets can be obtained from different sources, such as human experts, web navigation patterns, and clusters of user evaluations. Moreover, the extended Yoda can learn the confidence values automatically by utilizing implicit users’ relevance feedback through web navigations using genetic algorithms (GA). Our end-to-end experiments show while Yoda’s complexity is low and remains constant as the number of users and/or items grow, its accuracy surpasses that of the basic nearest-neighbor method by a wide margin (in most cases more than 100%). The experimental results also indicate that the retrieval accuracy is significantly increased by using the GA-based learning mechanism. Keywords: e-commerce, recommendation systems, genetic algorithm, relevance feedback 1. Introduction As the amount of available products in e-commerce businesses is burgeoning, searching for desired products among enormous offerings is becoming increasingly difficult. As a result, e-commerce users frequently suffer from information overload. To alleviate this problem, recommendation systems are being widely adopted to help customers locate products they would like to purchase. In essence, these systems apply data analysis techniques to provide a list of recommended products for each online customer. The most famous example in e-commerce businesses is the “Customers who bought” feature used in Amazon.comTM, which is basically applied to every product page on its websites. With the help of this feature, Amazon.comTM’s system recommends similar products to the current buyer based on the purchase histories of previous customers who bought the same product. ∗This research has been funded in part by NSF grants EEC-9529152 (IMSC ERC) and IIS-0082826, NIH-NLM R01-LM07061, DARPA and USAF under agreement nr. F30602-99-1-0524, and unrestricted cash gifts from Microsoft, NCR, and Okawa Foundations
SHAHABI AND CHEN In an earlier work [24], we introduced a hybrid recommendation system--Yoda, which simultaneously utilizes the advantages of clustering, content analysis, and collaborate fil- tering(CF)approaches. Basically, Yoda is a two-step approach recommendation system. During the offline process, Yoda maintains numerous recommendation lists obtained from typical navigation patterns analyzed by clustering and content analysis techniques. During the online process, the confidence value of an active user to each navigation-pattern cluster is estimated using the PPED distance measure [25] by comparing the user's navigation attern with centroid of each cluster. Finally, Yoda generates customized recommendations for the user by aggregating across recommendation lists(one list for each cluster) using the confidence values as the weights To expedite the aggregation step, we proposed an optimized fuzzy aggregation function that reduces the time complexity of aggregation from O(N X E)to O(N), where N is the number of recommended items and e is the number of clusters. Besides the advantage of efficiency, our aggregation function(similar to FALCON [31] can manage disjunctive queries while the traditional weighted average method cannot. For example, assume that the system knows all the information about users and user U is interested in the items the list of items recommended by both expert A and B while our aggregation metho rate recommended by expert A or B. The traditional weighted average method can only gene retrieve the expected list. In sum, the time complexity is reduced through a model-based technique, a clustering approach, and the optimized aggregation method. Additionally, due to the utilization of con tent analysis techniques, Yoda can detect the latent association between items and therefore provides better recommendations. Moreover, Yoda is able to collect information about user interests from implicit web navigation behaviors while most other recommendation systems 3, 9, 20, 23, 28] do not have this ability and therefore require explicit rating information from users. Consequently, Yoda puts less overhead on the users. Since content analysis techniques only capture certain characteristics of products, some desired products might not be included in the recommendation lists produced by analyzing the content. For example, picking wines based on brands, years, and descriptors might not be adequate if"smell"and"taste"are more important characteristics. In order to remedy for this problem, we extend Yoda to incorporate more recommendation lists than just web navigation patterns. These recommendation lists can be obtained from various experts, such as human experts and clusters of user evaluations. Meanwhile, because PPED is specially designed for measuring the similarity between two web navigation patterns including related data such as browsed items, view time, and sequences information, it can only be used for estimating confidence values to navigation- pattern clusters. Therefore, a leaming mechanism is needed for obtaining the complete confidence values of an active user toward all experts. We propose a learning mechanism that utilizes users'relevance feedback to improve confidence values automatically using To the best of our knowledge, only a few studies[ 18, 29]incorporate Ga for improving the user profiles. In these studies, users are directly involved in the evolution processes. Because users have to enter data for each product inquiry, they are often frustrated with this method On the contrary, in our design, users are not required to offer additional data to improve
174 SHAHABI AND CHEN In an earlier work [24], we introduced a hybrid recommendation system—Yoda, which simultaneously utilizes the advantages of clustering, content analysis, and collaborate filtering (CF) approaches. Basically, Yoda is a two-step approach recommendation system. During the offline process, Yoda maintains numerous recommendation lists obtained from typical navigation patterns analyzed by clustering and content analysis techniques. During the online process, the confidence value of an active user to each navigation-pattern cluster is estimated using the PPED distance measure [25] by comparing the user’s navigation pattern with centroid of each cluster. Finally, Yoda generates customized recommendations for the user by aggregating across recommendation lists (one list for each cluster) using the confidence values as the weights. To expedite the aggregation step, we proposed an optimized fuzzy aggregation function that reduces the time complexity of aggregation from O(N × E) to O(N), where N is the number of recommended items and E is the number of clusters. Besides the advantage of efficiency, our aggregation function (similar to FALCON [31]) can manage disjunctive queries while the traditional weighted average method cannot. For example, assume that the system knows all the information about users and user U is interested in the items recommended by expert A or B. The traditional weighted average method can only generate the list of items recommended by both expert A and B while our aggregation method can retrieve the expected list. In sum, the time complexity is reduced through a model-based technique, a clustering approach, and the optimized aggregation method. Additionally, due to the utilization of content analysis techniques, Yoda can detect the latent association between items and therefore provides better recommendations. Moreover, Yoda is able to collect information about user interests from implicit web navigation behaviors while most other recommendation systems [3, 9, 20, 23, 28] do not have this ability and therefore require explicit rating information from users. Consequently, Yoda puts less overhead on the users. Since content analysis techniques only capture certain characteristics of products, some desired products might not be included in the recommendation lists produced by analyzing the content. For example, picking wines based on brands, years, and descriptors might not be adequate if “smell” and “taste” are more important characteristics. In order to remedy for this problem, we extend Yoda to incorporate more recommendation lists than just web navigation patterns. These recommendation lists can be obtained from various experts, such as human experts and clusters of user evaluations. Meanwhile, because PPED is specially designed for measuring the similarity between two web navigation patterns including related data such as browsed items, view time, and sequences information, it can only be used for estimating confidence values to navigationpattern clusters. Therefore, a learning mechanism is needed for obtaining the complete confidence values of an active user toward all experts. We propose a learning mechanism that utilizes users’ relevance feedback to improve confidence values automatically using genetic algorithms (GA) [7]. To the best of our knowledge, only a few studies [18, 29] incorporate GA for improving the user profiles. In these studies, users are directly involved in the evolution processes. Because users have to enter data for each product inquiry, they are often frustrated with this method. On the contrary, in our design, users are not required to offer additional data to improve
AN ADAPTIVE RECOMMENDATION SYSTEM 175 the confidence values. These confidence values are corrected by the gA-based learning mechanisms using users' future navigation behaviors. Our experimental results indicate a significant increase in the accuracy of recommendation results due to the integration of the proposed learning mechanism. The remainder of this paper is organized as follows. Section 2 summarizes the related work. In Section 3, we provide an overview on the functionality of Yoda In Section 4, we discuss the detailed design of Yoda. Section 5 depicts the learning mechanism of Yoda. Section 6 describes the results of our evaluations as well as the details of the system implementation and our benchmarking method Section 7 concludes the paper 2. Related work Recommendation systems are designed either based on content-based filtering or collab- orative filtering. Both types of systems have inherent strengths and weaknesses, where content-based approaches directly exploit the product information, and the collaboration filtering approaches utilize specific user rating information Content-based filtering approaches are derived from the concepts introduced by the In- formation Retrieval (IR)community. Content-based filtering systems are usually criticized or two weaknesses 1. Content limitation: IR methods can only be applied to a few kinds of content, such as text and image, and the extracted features can only capture certain aspects of the content Over-specialization: Content-based recommendation system provides recommendations merely based on user profiles. Therefore, users have no chance of exploring new items at are not similar to those items included in their profiles The collaborative filtering(CF) approach remedies for these two problems. Typically, CF-based recommendation systems do not use the actual content of the items for recommen- dation. Moreover, since other user profiles are also considered, user can explore new items The nearest-neighbor algorithm is the earliest CF-based technique used in recommendation systems [20, 28]. With this algorithm, the similarity between users is evaluated based on their ratings of products, and the recommendation is generated considering the items visited by nearest neighbors of the user. In its original form, the nearest-neighbor algorithm uses a two-dimensional user-item matrix to represent the user profiles. This original form of CF-based recommendation systems suffers from three problems 1. Scalability: The time complexity of executing the nearest-neighbor algorithm grows linearly with the number of items and the number of users. Thus, the recommendation system cannot support large-scale applications such as Amazon. com, which provides more than 18 million unique items for over 20 million users 2. Sparsity: Due to large number of items and user reluctance to rate the items, usually the profile matrix is sparse. Therefore, the system cannot provide recommendations for some users, and the generated recommendations are not accurate
AN ADAPTIVE RECOMMENDATION SYSTEM 175 the confidence values. These confidence values are corrected by the GA-based learning mechanisms using users’ future navigation behaviors. Our experimental results indicate a significant increase in the accuracy of recommendation results due to the integration of the proposed learning mechanism. The remainder of this paper is organized as follows. Section 2 summarizes the related work. In Section 3, we provide an overview on the functionality of Yoda. In Section 4, we discuss the detailed design of Yoda . Section 5 depicts the learning mechanism of Yoda. Section 6 describes the results of our evaluations as well as the details of the system implementation and our benchmarking method. Section 7 concludes the paper. 2. Related work Recommendation systems are designed either based on content-based filtering or collaborative filtering. Both types of systems have inherent strengths and weaknesses, where content-based approaches directly exploit the product information, and the collaboration filtering approaches utilize specific user rating information. Content-based filtering approaches are derived from the concepts introduced by the Information Retrieval (IR) community. Content-based filtering systems are usually criticized for two weaknesses: 1. Content limitation: IR methods can only be applied to a few kinds of content, such as text and image, and the extracted features can only capture certain aspects of the content. 2. Over-specialization: Content-based recommendation system provides recommendations merely based on user profiles. Therefore, users have no chance of exploring new items that are not similar to those items included in their profiles. The collaborative filtering (CF) approach remedies for these two problems. Typically, CF-based recommendation systems do not use the actual content of the items for recommendation. Moreover, since other user profiles are also considered, user can explore new items. The nearest-neighbor algorithm is the earliest CF-based technique used in recommendation systems [20, 28]. With this algorithm, the similarity between users is evaluated based on their ratings of products, and the recommendation is generated considering the items visited by nearest neighbors of the user. In its original form, the nearest-neighbor algorithm uses a two-dimensional user-item matrix to represent the user profiles. This original form of CF-based recommendation systems suffers from three problems: 1. Scalability: The time complexity of executing the nearest-neighbor algorithm grows linearly with the number of items and the number of users. Thus, the recommendation system cannot support large-scale applications such as Amazon.comTM, which provides more than 18 million unique items for over 20 million users. 2. Sparsity: Due to large number of items and user reluctance to rate the items, usually the profile matrix is sparse. Therefore, the system cannot provide recommendations for some users, and the generated recommendations are not accurate
SHAHABI AND CHEN 3. Synonymy: Since contents of the items are completely ignored, latent association between items is not considered for recommendations. Thus, as long as new items are not rated they are not recommended; hence, false negatives are introduced In order to solve these problems, a variety of different techniques have been proposed Some of techniques, such as dimensionality reduction [22, 23], clustering[16], and Bayesian Network [3, 9], mainly are remedies for the scalability problem. These techniques extract characteristics(patterns)from the original dataset in an offline process andemploy only these patterns to generate the recommendation lists in the online process. Although this approach reduce the online processing cost, it often reduces the accuracy of the recommending results. Moreover, the online computation complexity keeps increasing with the number of patterns. Some other techniques, such as association rules [17, 23], content analysis [1, 2, 151, categorization [6, 12), are emphasized on alleviating the sparsity and synonymy problems Basically, these techniques analyze the Web usage data(from Web server logs )to capture the latent association between items. Subsequently, based on both item association information and user ratings, the recommendation systems can thus generate better recommendation to users. However, the online computation time concurrently increases, as more data incorporated into the recommendation progress. Additionally, because Web usage data from the server side are not reliable [26], the item association generated from Web server logs might be wrong. ith Yoda [24], we introduced a hybrid recommendation model, which consists of two steps During the offline process, Yoda generates cluster recommendation lists based on the Web usage data from the client-side through clustering and content analysis techniques This approach not only can address the scalability problem by the preprocessing work, but also can alleviate the sparsity and synonymy problems by discovering latent assoc tion between items. Since the Web usage data from the client-side can capture real user navigation behaviors, the item association discovered by the Yoda system would be more accurate. Beside the cluster recommendation lists. Yoda also maintains numerous recom- mendation lists obtained from different experts, such as human experts of the Website domain, and the cluster representatives of the user ratings. By these additional recom- mendation lists, Yoda is less impacted by the preprocessing work as compared to other systems During the online process, for each user who is using the system, Yoda estimates his/her confidence values to each expert, who provides the recommendation list, based on his/her current navigation behaviors through the PPed distance measure [25] and our GA-based learning mechanism. Subsequently, Yoda generates customized recommendations for the user by aggregating across recommendation lists using the confidence value as the weight. In order to expedite the aggregation step, Yoda employs an optimized fuzzy aggregation function that reduces the time computation complexity of aggregation from O(N X E to O(N), where N is the number of recommended items in the final recommendation list to users and E is the number of recommendation lists maintained in the system. Conse- quently, the online computation complexity of Yoda remains the same even if number of ndation lists in
176 SHAHABI AND CHEN 3. Synonymy: Since contents of the items are completely ignored, latent association between items is not considered for recommendations. Thus, as long as new items are not rated, they are not recommended; hence, false negatives are introduced. In order to solve these problems, a variety of different techniques have been proposed. Some of techniques, such as dimensionality reduction [22, 23], clustering [16], and Bayesian Network [3, 9], mainly are remedies for the scalability problem. These techniques extract characteristics (patterns) from the original dataset in an offline process and employ only these patterns to generate the recommendation lists in the online process. Although this approach can reduce the online processing cost, it often reduces the accuracy of the recommending results. Moreover, the online computation complexity keeps increasing with the number of patterns. Some other techniques, such as association rules [17, 23], content analysis [1, 2, 15], categorization [6, 12], are emphasized on alleviating the sparsity and synonymy problems. Basically, these techniques analyze the Web usage data (from Web server logs) to capture the latent association between items. Subsequently, based on both item association information and user ratings, the recommendation systems can thus generate better recommendation to users. However, the online computation time concurrently increases, as more data are incorporated into the recommendation progress. Additionally, because Web usage data from the server side are not reliable [26], the item association generated from Web server logs might be wrong. With Yoda [24], we introduced a hybrid recommendation model, which consists of two steps. During the offline process, Yoda generates cluster recommendation lists based on the Web usage data from the client-side through clustering and content analysis techniques. This approach not only can address the scalability problem by the preprocessing work, but also can alleviate the sparsity and synonymy problems by discovering latent association between items. Since the Web usage data from the client-side can capture real user navigation behaviors, the item association discovered by the Yoda system would be more accurate. Beside the cluster recommendation lists, Yoda also maintains numerous recommendation lists obtained from different experts, such as human experts of the Website domain, and the cluster representatives of the user ratings. By these additional recommendation lists, Yoda is less impacted by the preprocessing work as compared to other systems. During the online process, for each user who is using the system, Yoda estimates his/her confidence values to each expert, who provides the recommendation list, based on his/her current navigation behaviors through the PPED distance measure [25] and our GA-based learning mechanism. Subsequently, Yoda generates customized recommendations for the user by aggregating across recommendation lists using the confidence value as the weight. In order to expedite the aggregation step, Yoda employs an optimized fuzzy aggregation function that reduces the time computation complexity of aggregation from O(N × E) to O(N), where N is the number of recommended items in the final recommendation list to users and E is the number of recommendation lists maintained in the system. Consequently, the online computation complexity of Yoda remains the same even if number of recommendation lists increases
AN ADAPTIVE RECOMMENDATION SYSTEM 177 3. Overview The primary objective of a web-based recommendation system can be stated as follows Problem 1. Suppose the item-set /=[i l i is an item presented in a web-site) and u is a user interactively navigating the Web-site. The recommendation problem is to obtain the ut's wish-list I, e I. which is a list of items that are ranked based on us interests In general, to acquire a wish-list for a user, a recommendation process goes through three 1. Obtaining user perceptions: Data about user perceptions such as navigation behaviors are collected. In some systems [9, 22], these data need further processing for inferring information which is used in the later phases 2. Ranking the items: The inferred user interests are utilized to provide the predicted user wish-list 3. Adjusting user settings: The system acquires relevance feedback(or follow-up navigation behaviors)from the user and employs it to refine the user settings, which represent the user perceptions. On occasion, this phase is integrated into phase one Figure 1 illustrates the processing flow of Yoda. Suppose that music CDs are the recom- mending items in the Yoda web-site. The objective of Yoda is to provide each active user. who is using the system, a satisfactory-and-customized recommendation list of CDs by Experts wish-lists Other Experts' wish-lists Cluste Aggregat⊥on Learning List Cluster Centroi Update Confidence values User Navigation Figure 1. Processing flow of Yoda
AN ADAPTIVE RECOMMENDATION SYSTEM 177 3. Overview The primary objective of a web-based recommendation system can be stated as follows: Problem 1. Suppose the item-set I = {i | i is an item presented in a web-site} and u is a user interactively navigating the Web-site. The recommendation problem is to obtain the u’s wish-list Iu ∈ I, which is a list of items that are ranked based on u’s interests. In general, to acquire a wish-list for a user, a recommendation process goes through three steps/phases: 1. Obtaining user perceptions: Data about user perceptions such as navigation behaviors are collected. In some systems [9, 22], these data need further processing for inferring information which is used in the later phases. 2. Ranking the items: The inferred user interests are utilized to provide the predicted user wish-list. 3. Adjusting user settings: The system acquires relevance feedback (or follow-up navigation behaviors) from the user and employs it to refine the user settings, which represent the user perceptions. On occasion, this phase is integrated into phase one. Figure 1 illustrates the processing flow of Yoda. Suppose that music CDs are the recommending items in the Yoda web-site. The objective of Yoda is to provide each active user, who is using the system, a satisfactory-and-customized recommendation list of CDs by Figure 1. Processing flow of Yoda
SHAHABI AND CHEN analyzing his/her navigation behaviors. To accomplish this goal, Yoda performs a two-step process. During the offline process, Yoda clusters the collected Web usage data from the browser side and generates the corresponding cluster recommendation list(termed experts wish-lists)and the centroid for each navigation-pattern cluster Moreover, Yoda also main- tains other experts' wish-lists obtained from different experts, such as human experts, and the cluster representatives of user ratings Later, during the on-line recommendation process, the system first acquires the initial confidence values by softly classifying'the active user into navigation-pattern clusters Note that the confidence values between the user and other experts are not estimated at this step because the estimation process is comparably time consuming. Subsequently, Yoda uses these confidence values to generate the customized recommendation(termed user wish list) by weighted aggregation of the experts'wish-lists. After the user receives the user wish-list and continue to navigate the website. Yoda will correct the confidence values in a background process by utilizing the follow-up user navigation behaviors and the GA-based 4. System design In this section, we provide a detailed description of Yoda's components. Since phase I is based on our previous work [25], here we elaborate more on phase II and Ill of Yoda. The phase Ill is described in Section 5 4.1. Phase 1-Obtaining user perception Yoda uses the client-side tracking mechanism proposed in [27] to capture view-time, hit count, and sequence of visiting the web-pages (items) within a web-site. These features reflect users'interests on items. To analyze these features and infer the user interests, Yoda employs the Feature Matrices(FM)model, which we introduced in [25]. FMis aset of hyper- cube data structures that can represent various aggregated access features with any required precision. With FM, the patterns of both a single user and a cluster of users are modeled. Here, Yoda uses FM to model the navigation patterns of the active users individually, and then the aggregated navigation pattern of each cluster is generated by clustering a collection of user navigation behaviors. Yoda also applies a similarity measure, termed Projected Pure Euclidean Distance(PPED)[25], to evaluate the similarity of a user navigation to a cluster navigation pattern Thus, Yoda can quantify the confidence value of a user to each navigation-pattern cluster However, because PPED can only apply to the navigation pattern behaviors modeled by FM model, Yoda cannot acquire the confidence values of a user's interests to the recor lists of other experts at this step 4.2. Phase ll-Ranking the items Two types of work in Yoda involve ranking the items. The first type is generating the experts recommendations, which are lists of ranked items produced by either human experts, clusters
178 SHAHABI AND CHEN analyzing his/her navigation behaviors. To accomplish this goal, Yoda performs a two-step process. During the offline process, Yoda clusters the collected Web usage data from the browser side and generates the corresponding cluster recommendation list (termed experts’ wish-lists) and the centroid for each navigation-pattern cluster. Moreover, Yoda also maintains other experts’ wish-lists obtained from different experts, such as human experts, and the cluster representatives of user ratings. Later, during the on-line recommendation process, the system first acquires the initial confidence values by softly classifying1 the active user into navigation-pattern clusters. Note that the confidence values between the user and other experts are not estimated at this step because the estimation process is comparably time consuming. Subsequently, Yoda uses these confidence values to generate the customized recommendation (termed user wishlist) by weighted aggregation of the experts’ wish-lists. After the user receives the user wish-list and continue to navigate the website, Yoda will correct the confidence values in a background process by utilizing the follow-up user navigation behaviors and the GA-based learning mechanism. 4. System design In this section, we provide a detailed description of Yoda’s components. Since phase I is based on our previous work [25], here we elaborate more on phase II and III of Yoda. The phase III is described in Section 5. 4.1. Phase I—Obtaining user perception Yoda uses the client-side tracking mechanism proposed in [27] to capture view-time, hitcount, and sequence of visiting the web-pages (items) within a web-site. These features reflect users’ interests on items. To analyze these features and infer the user interests, Yoda employs theFeature Matrices(FM) model, which we introduced in [25]. FM is a set of hypercube data structures that can represent various aggregated access features with any required precision. With FM, the patterns of both a single user and a cluster of users are modeled. Here, Yoda uses FM to model the navigation patterns of the active users individually, and then the aggregated navigation pattern of each cluster is generated by clustering a collection of user navigation behaviors. Yoda also applies a similarity measure, termed Projected Pure Euclidean Distance (PPED) [25], to evaluate the similarity of a user navigation to a cluster navigation pattern. Thus, Yoda can quantify the confidence value of a user to each navigation-pattern cluster. However, because PPED can only apply to the navigation pattern behaviors modeled by FM model, Yoda cannot acquire the confidence values of a user’s interests to the recommendation lists of other experts at this step. 4.2. Phase II—Ranking the items Two types of work in Yoda involve ranking the items. The first type is generating the experts’ recommendations, which are lists of ranked items produced by either human experts, clusters
AN ADAPTIVE RECOMMENDATION SYSTEM 179 of users, or clusters of navigation patterns In our previous work [24], a content analysis technique was proposed to abstract common interests from navigation patterns. With this technique, the system can generate a list of ranked items for each navigation-pattern cluster. However, for the sake of simplicity, we briefly describe this technique and only focus on another type of ranking work--generating the user wish-list online. In order to properly describe this method, we first formally define some necessary terms Definition 4.1. An item i is an instance of product, service, etc. that is presented in a reb-site. Items are described by their properties, which are abstract perceptual features, and the corresponding degrees of the properties. ((p, pi)I p is a property E P, Pi is the degree of i to P l or example, for a music CD as an item, "styles"of the music, "ratings", and"popularity can be considered as properties of the item. Since most of properties are perceptual and asking for precise values to rate these percep- tual properties is inappropriate, we use limited fuzzy-sets(f E F) to evaluate proprieties. nis design, we can also ease the difficulties of the content analysis in the later process Definition 4.2. A wish-list, Ir, for user/expert x is defined I =l(i, v(i)I i is an item, U(i)E[0, 11) where the preference value U(i) measures the probability of item i be of interest to user/expert x Definition 43. A cluster browse-list, Bk, for navigation-pattern cluster k is a list of items visited by all users in this cluster. Definition 4. 4. A confidence value I for a user u to an expert e is formally defined as: T: u,eeE- bu.e, where E denotes a set of experts in the system and U represents the set of users who have assigned reference confidence values to experts. Note that the value of bu, e is represented as a fuzzy term for two reasons. First, using fuzzy terms to describe forms of human judgment is more appropriate. Second, by the limited set of fuzzy terms, the online computation process can be expedited 4.2.1. Generating navigation-Pattern cluster wish-lists. Yoda represents the aggregated interests of the users in each cluster by a set of property values(Pvs), termed favorite PVs of the cluster. These favorite PVs indicate the emphasized degree of the properties by the majority of users in this cluster. In order to extract these values from the navigation data in this cluster, we design a voting procedure defined as: Definition 4.5. The favorite PV, Fp(k), identifies likelihood of the cluster k being inter ested in property p of the items and is extracted from the cluster browse-list Bk through
AN ADAPTIVE RECOMMENDATION SYSTEM 179 of users, or clusters of navigation patterns. In our previous work [24], a content analysis technique was proposed to abstract common interests from navigation patterns. With this technique, the system can generate a list of ranked items for each navigation-pattern cluster. However, for the sake of simplicity, we briefly describe this technique and only focus on another type of ranking work—generating the user wish-list online. In order to properly describe this method, we first formally define some necessary terms. Definition 4.1. An item i is an instance of product, service, etc. that is presented in a web-site. Items are described by their properties, which are abstract perceptual features, and the corresponding degrees of the properties. i = {(p, p˜i) | p is a property ∈ P, p˜i is the degree of i to p} ∈ I (1) For example, for a music CD as an item, “styles” of the music, “ratings”, and “popularity” can be considered as properties of the item. Since most of properties are perceptual and asking for precise values to rate these perceptual properties is inappropriate, we use limited fuzzy-sets ( f ∈ F) to evaluate proprieties. By this design, we can also ease the difficulties of the content analysis in the later process. Definition 4.2. A wish-list, Ix , for user/expert x is defined as: Ix = {(i, vx (i)) | i is an item, vx (i) ∈ [0, 1]} (2) where the preference value vx (i) measures the probability of item i be of interest to user/expert x. Definition 4.3. A cluster browse-list, Bk , for navigation-pattern cluster k is a list of items visited by all users in this cluster. Definition 4.4. A confidence value π for a user u to an expert e is formally defined as: π : u, e ∈ E → bu,e, where E denotes a set of experts in the system and U represents the set of users who have assigned reference confidence values to experts. Note that the value of bu,e is represented as a fuzzy term for two reasons. First, using fuzzy terms to describe forms of human judgment is more appropriate. Second, by the limited set of fuzzy terms, the online computation process can be expedited. 4.2.1. Generating navigation-pattern cluster wish-lists. Yoda represents the aggregated interests of the users in each cluster by a set of property values (PVs), termed favorite PVs of the cluster. These favorite PVs indicate the emphasized degree of the properties by the majority of users in this cluster. In order to extract these values from the navigation data in this cluster, we design a voting procedure defined as: Definition 4.5. The favorite PV, Fp(k), identifies likelihood of the cluster k being interested in property p of the items and is extracted from the cluster browse-list Bk through
SHAHABI AND CHEN Cp,f(k)=‖ili∈Bk,=f Fp(k)=maxf E F, Cp, (k)=max (Cp, P(k)) Example 4.1. Suppose the browse-list of cluster K is A,B, G,K, Y, Z, and the values of property"Rock"for the corresponding CDs are (A, high),(B, high),(G, low),(K, medium), (Y, high), (Z, high). Because"high"has the maximum vote, the favorite PV of cluster K, FRock(K), is"high Based on these extracted favorite PVs of the cluster k, Yoda can evaluate uk(i), preference value of an item i for cluster k, by quantifying the similarity between favorite PVs and property values associated with item i. The aggregation function used to compute ux(i)is k()=max{Fp(k)×历 Example 4.2. Suppose the favorite PV of cluster K is(Rock, high), (Pop, high), (Vocal medium),(Soundtrack, medium),( Classic, low), and the item i is defined as (Rock, low), DJ. According to the above, the preference value ux(i)=max(high x low),(high x low), (medium x low), (medium x high), (low x low))=(medium x high)=0.75 4.2.2. Generating user wish-lists. During the on-line recommendation process, Yoda ag gregates the experts'wish-lists to generate the predicted user wish-list for the active user l. A fuzzy aggregation function is employed to measure and quantify the preference value vu(i)of each item i for the user u based on the user profile of user u. We use an optimized aggregation function with a triangular norm[4]. A triangular norm aggregation function g satisfy the following properties Monotonicity:g(x,y)≤g(x’,y’)ifx≤x'andy≤y Commutativity: g(r, y)=g(, x) ity: g(g(r, y).2)=g(r,g(, z)) with these properties, the query optimizer can replace the original query with a logically equivalent one and still obtain the exact same result. The optimized aggregation function we propose for Yoda is: Definition 4.6. First, experts are grouped based on their reference confidence values assigned by user G(u)=le l f is a fuzzy set F,Tue=fy Then, the preference value u(i) for item i is computed as: EB,f(i)=f×maxe()le∈Gr() ()=max{En,f()wf∈F}
180 SHAHABI AND CHEN Eq. (4). Cp, f (k) = {i | i ∈ Bk , p˜i = f } (3) Fp(k) = max f | f ∈ F,Cp, f (k) = max ∀ f ∈F {Cp, f (k)} Example 4.1. Suppose the browse-list of cluster K is {A, B, G, K, Y, Z}, and the values of property “Rock” for the corresponding CDs are {(A, high), (B, high), (G, low), (K, medium), (Y, high), (Z, high)}. Because “high” has the maximum vote, the favorite PV of cluster K, FRock(K), is “high”. Based on these extracted favorite PVs of the cluster k, Yoda can evaluate vk (i), preference value of an item i for cluster k, by quantifying the similarity between favorite PVs and property values associated with item i. The aggregation function used to compute vk (i) is: vk (i) = max{Fp(k) × p˜i} (4) Example 4.2. Suppose the favorite PV of cluster K is (Rock, high), (Pop, high), (Vocal, medium), (Soundtrack, medium), (Classic, low), and the item i is defined as {(Rock, low), (Pop, low), (Vocal, low), (Soundtrack, high), (Classic, low)}. According to the equations above, the preference value vK (i) = max {(high × low), (high × low), (medium × low), (medium × high), (low × low)} = (medium × high) = 0.75. 4.2.2. Generating user wish-lists. During the on-line recommendation process, Yoda aggregates the experts’ wish-lists to generate the predicted user wish-list for the active user u. A fuzzy aggregation function is employed to measure and quantify the preference value vu(i) of each item i for the user u based on the user profile of user u. We use an optimized aggregation function with a triangular norm [4]. A triangular norm aggregation function g satisfy the following properties: Monotonicity: g(x, y) ≤ g(x , y ) if x ≤ x and y ≤ y Commuatativity: g(x, y) = g(y, x) Associativity: g(g(x, y),z) = g(x, g(y,z)) With these properties, the query optimizer can replace the original query with a logically equivalent one and still obtain the exact same result. The optimized aggregation function we propose for Yoda is: Definition 4.6. First, experts are grouped based on their reference confidence values assigned by user u. G f (u) = {e | f is a fuzzy set ∈ F, πu,e = f } (5) Then, the preference value vu(i) for item i is computed as: Eu, f (i) = f × max{ve(i) | e ∈ G f (u)} vu(i) = max{Eu, f (i) | ∀ f ∈ F} (6)
AN ADAPTIVE RECOMMENDATION SYSTEM Basically, this aggregation function partitions the preference values into Fll different subgroups according to the confidence values of the expert e. Subsequently, the system maintains a list of maximum preference values for all subgroups. Finally, the system com- putes the preferences of all items in the user wish-list by iterating through all subgroups. As compared to a naive weighted aggregation function with time complexity O( x/D) (where Ell is the number of experts in the system) the complexity of the proposed ag gregation function is O(F‖×‖/)=O(‖/D), where‖F‖ is a small constant number representing the number of fuzzy terms To reduce the time complexity of generating the user wish-lists further, we apply a cut-off point on the expert wish-lists. Each shorten wish-list includes the N best-ranked items according to their preference values for the corresponding expert In Fagin [4], Fagin has proposed an optimized algorithm, the Ao algorithm, to retrieve N best items from a collection of subsets of items with time complexity proportional to N rather than total lumber of items. However, the Ao algorithm can only be employed by the aggregation function that satisfy triangular norm form Here, by takin lbgroups of items(as described above) as the subsets, the Ao algorithm can be incorporated into Yoda. Applying the Ao algorithm to generate a user wish-list with cut- off point M, we reduce the time complexity to O(‖F‖×‖N)=O(‖N here‖‖<‖l 5. The learning mechanism of Yoda In this section, we provide a detailed description of the learning mechanism. We first describe the background knowledge of GA. Next, we elaborate the details on phase Ill of Yoda in Section 5.2 5.1. Background on genetic algorithms Genetic algorithms (GAs), which were introduced by Holland [7], are iterative search techniques based on the spirit of natural evolution. By emulating biological selection and reproduction, GAs can efficiently search through the solution space of complex problems asically, a ga operates on a population of candidate solutions called chromosomes. A chromosome, which is composed of numerous genes, represents an encoding of the problem and associates it with a fitness value evaluated by the fitness function. This fitness value determines the goodness and the survivability of the chromosome Generally, GA starts by initializing the population andevaluating its corresponding fitness values Before it terminates, GA produces newer generations iteratively. At each generation, a portion of the chromosomes is selected according to the survivability for reproducing used for replacing some chromosomes in the population with a probability consistent with their fitness values. In other words, with the help of the fitness function to point out the orrect direction, GA could construct better and better chromosomes from the best partial genes of past samplings. Please see reference [5] for matl
AN ADAPTIVE RECOMMENDATION SYSTEM 181 Basically, this aggregation function partitions the preference values into F different subgroups according to the confidence values of the expert e. Subsequently, the system maintains a list of maximum preference values for all subgroups. Finally, the system computes the preferences of all items in the user wish-list by iterating through all subgroups. As compared to a naive weighted aggregation function with time complexity O( E × I ) (where E is the number of experts in the system) the complexity of the proposed aggregation function is O( F × I ) = O( I ), where F is a small constant number representing the number of fuzzy terms. To reduce the time complexity of generating the user wish-lists further, we apply a cut-off point on the expert wish-lists. Each shorten wish-list includes the N best-ranked items according to their preference values for the corresponding expert. In Fagin [4], Fagin has proposed an optimized algorithm, the A0 algorithm, to retrieve N best items from a collection of subsets of items with time complexity proportional to N rather than total number of items. However, the A0 algorithm can only be employed by the aggregation function that satisfy triangular norm form. Here, by taking the subgroups of items (as described above) as the subsets, the A0 algorithm can be incorporated into Yoda.2 Applying the A0 algorithm to generate a user wish-list with cut-off point N, we reduce the time complexity to O( F × N ) = O( N ), where N I . 5. The learning mechanism of Yoda In this section, we provide a detailed description of the learning mechanism. We first describe the background knowledge of GA. Next, we elaborate the details on phase III of Yoda in Section 5.2. 5.1. Background on genetic algorithms Genetic algorithms (GAs), which were introduced by Holland [7], are iterative search techniques based on the spirit of natural evolution. By emulating biological selection and reproduction, GAs can efficiently search through the solution space of complex problems. Basically, a GA operates on a population of candidate solutions called chromosomes. A chromosome, which is composed of numerous genes, represents an encoding of the problem and associates it with a fitness value evaluated by the fitness function. This fitness value determines the goodness and the survivability of the chromosome. Generally, GA starts by initializing the population and evaluating its corresponding fitness values. Before it terminates, GA produces newer generations iteratively. At each generation, a portion of the chromosomes is selected according to the survivability for reproducing offspring. The offspring are generated through crossover and mutation processes and are used for replacing some chromosomes in the population with a probability consistent with their fitness values. In other words, with the help of the fitness function to point out the correct direction, GA could construct better and better chromosomes from the best partial genes of past samplings. Please see reference [5] for mathematical foundations
SHAHABI AND CHEN In summary, GA is composed of a fitness function, a population of chromosomes and three operators--selection, crossover and mutation. The parameter settings of the operators ca be chosen depending on the applications or remain unchanged even when the applications are varied. However, the fitness function and the coding method are required to be specially designed for each problem. The design of fitness function and encoding method for Yoda will be described in Section 5.2 Comparing to other learning techniques, GA has four major differences [5] The learning processes of Ga do not perform on the parameters directly but on a coding of the parameter set. As a result, unlike other learning techniques, GA can perform learnin procedures for all types of parameters without limiting the learning ability. For example, the solution space can be discrete or continuous in GA, while the Rocchio-based learning algorithms [21]only allow continuous solution space. GA searches the optimal solution from a population of points. Since these points ca simultaneously seek the characteristics of the optimal solution, the learning process is more efficient than other common learning techniques, such as neural network and reinforcement learning The learning processes of GA are based on the payoffinformation from the fitness function only. Contrasting to GA, many learning techniques require auxiliary information during learning. For example, derivatives are essential for gradient-based learning techniques, and the environment information is necessary for dynamic programming method of the reinforcement learning techniques. In relevance feedback problems, since Ga does not directly employ the relevance feedback but employ the payoff information as an objective direction, the GA-based learning mechanism can tolerant incorrect relevance feedback. Moreover, GA can also learn from the relevance feedback where the majority of items receive negative scores Ga uses probabilistic transition rules to lead the search direction as well. Hence, Ga ha less chance of stopping in local optimal answers Overall, given the facts that: 1)the solution space is discrete in Yoda, 2) the relevance feedback data are usually imperfect, 3)the majority of items may indeed receive negative cores,4)the learning processes should be efficient, and 5) the chance of finding local optimal should be minimized, the GA-based learning technique is more appropriate for Yoda than other learning techniques. 5.2. Phase lll--Adjusting user settings Yoda's learning mechanism is a background process performed at the same time that the users navigate the web-site. It employs GA for improving the list of confidence values by decoding the best chromosome to replace existing one in the system after its evolution Users are not required to make additional efforts to improve these confidence values. Yoda collects users'follow-up navigation behaviors and employs these behaviors as the goal of Ga prior to the beginning of evolution. Note that the learning mechanism is only triggered after receiving enough navigation data. In our implen n. it is activated
182 SHAHABI AND CHEN In summary, GA is composed of a fitness function, a population of chromosomes and three operators—selection, crossover and mutation. The parameter settings of the operators can be chosen depending on the applications or remain unchanged even when the applications are varied. However, the fitness function and the coding method are required to be specially designed for each problem. The design of fitness function and encoding method for Yoda will be described in Section 5.2. Comparing to other learning techniques, GA has four major differences [5]: – The learning processes of GA do not perform on the parameters directly but on a coding of the parameter set. As a result, unlike other learning techniques, GA can perform learning procedures for all types of parameters without limiting the learning ability. For example, the solution space can be discrete or continuous in GA, while the Rocchio-based learning algorithms [21] only allow continuous solution space. – GA searches the optimal solution from a population of points. Since these points can simultaneously seek the characteristics of the optimal solution, the learning process is more efficient than other common learning techniques, such as neural network and reinforcement learning. – The learning processes of GA are based on the payoff information from the fitness function only. Contrasting to GA, many learning techniques require auxiliary information during learning. For example, derivatives are essential for gradient-based learning techniques, and the environment information is necessary for dynamic programming method of the reinforcement learning techniques. In relevance feedback problems, since GA does not directly employ the relevance feedback but employ the payoff information as an objective direction, the GA-based learning mechanism can tolerant incorrect relevance feedback. Moreover, GA can also learn from the relevance feedback where the majority of items receive negative scores. – GA uses probabilistic transition rules to lead the search direction as well. Hence, GA has less chance of stopping in local optimal answers. Overall, given the facts that: 1) the solution space is discrete in Yoda, 2) the relevance feedback data are usually imperfect, 3) the majority of items may indeed receive negative scores, 4) the learning processes should be efficient, and 5) the chance of finding local optimal should be minimized, the GA-based learning technique is more appropriate for Yoda than other learning techniques. 5.2. Phase III—Adjusting user settings Yoda’s learning mechanism is a background process performed at the same time that the users navigate the web-site. It employs GA for improving the list of confidence values by decoding the best chromosome to replace existing one in the system after its evolution. Users are not required to make additional efforts to improve these confidence values. Yoda collects users’ follow-up navigation behaviors and employs these behaviors as the goal of GA prior to the beginning of evolution.3 Note that the learning mechanism is only triggered after receiving enough navigation data. In our implementation, it is activated