C 2001 Kluwer Academic Publishers. Manufactured in The Netherlands Expert-Driven Validation of Rule-Based User Models in Personalization Applications GEDIMINAS ADOMAVICIUS adomavic acs myu. edu Computer Science Department, Courant Institute of Mathematical Sciences. New York University, 25/ Mercer Street. New York. NY 10012. USA ALEXANDER TUZHILIN Information Systems Department, Stern School of Business, Ne York University, 44 West 4th Street, Room 9-78 New York. NY 10012. US/A Editors: Ron Kohavi and Foster provost act. e-commerce a om dynamic Web e geting, to individual recommendations to the customers, it is important to build personalized profiles of lual users from their transactional histories. These profiles constitute models of individual user behavior and be specified with sets of rules learned from user transactional histories using various data mining technique Since many discovered rules can be spurious, irrelevant, or trivial, one of the main problems is how to perform post-analysis of the discovered rules, i.e., how to validate user profiles by separating"good"rules from the"bad. This validation process should be done with an explicit participation of the human expert. However, complications may arise because there can be very large numbers of rules discovered in the applications that deal with many users, and the expert cannot perform the validation on a rule-by-rule basis in a reasonable period of time paper presents a framework for building behavioral profiles of individual users. It also introduces a new approach to expert-driven validation of a very large number of rules pertaining to these users. In particular, it presents several types of validation operators, including rule grouping, filtering, browsing, and redundant rule operators, that allow a human expert validate many individual rules at a time. By iteratively applying such the human expert can validate a significant part of initially discovered rules in an acceptable tim These validation operate plemented as a part of a one-to-one profiling system. The paper also presents a case study of using this system for validating individual user rules discovered in a marketing application Keywords: personalization, profiling, rule discovery, post-analysis, validation 1. Introduction In various e-commerce applications, ranging from dynamic Web content presentation, to personalized ad targeting, to individual recommendations to the customers, personalization has become an important business problem(Peppers and Rogers, 1993; Personalization Summit, 1999). For example, the personalized version of Yahoo(my Yahoo) provides This paper substantially augments and improves the preliminary version that appeared as a poster paper in the Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99)(Adomavicius and Tuzhilin, 1999)
Data Mining and Knowledge Discovery, 5, 33–58, 2001 °c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Expert-Driven Validation of Rule-Based User Models in Personalization Applications∗ GEDIMINAS ADOMAVICIUS adomavic@cs.nyu.edu Computer Science Department, Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA ALEXANDER TUZHILIN atuzhili@stern.nyu.edu Information Systems Department, Stern School of Business, New York University, 44 West 4th Street, Room 9-78, New York, NY 10012, USA Editors: Ron Kohavi and Foster Provost Abstract. In many e-commerce applications, ranging from dynamic Web content presentation, to personalized ad targeting, to individual recommendations to the customers, it is important to build personalized profiles of individual users from their transactional histories. These profiles constitute models of individual user behavior and can be specified with sets of rules learned from user transactional histories using various data mining techniques. Since many discovered rules can be spurious, irrelevant, or trivial, one of the main problems is how to perform post-analysis of the discovered rules, i.e., how to validate user profiles by separating “good” rules from the “bad.” This validation process should be done with an explicit participation of the human expert. However, complications may arise because there can be very large numbers of rules discovered in the applications that deal with many users, and the expert cannot perform the validation on a rule-by-rule basis in a reasonable period of time. This paper presents a framework for building behavioral profiles of individual users. It also introduces a new approach to expert-driven validation of a very large number of rules pertaining to these users. In particular, it presents several types of validation operators, including rule grouping, filtering, browsing, and redundant rule elimination operators, that allow a human expert validate many individual rules at a time. By iteratively applying such operators, the human expert can validate a significant part of all the initially discovered rules in an acceptable time period. These validation operators were implemented as a part of a one-to-one profiling system. The paper also presents a case study of using this system for validating individual user rules discovered in a marketing application. Keywords: personalization, profiling, rule discovery, post-analysis, validation 1. Introduction In various e-commerce applications, ranging from dynamic Web content presentation, to personalized ad targeting, to individual recommendations to the customers, personalization has become an important business problem (Peppers and Rogers, 1993; Personalization Summit, 1999). For example, the personalized version of Yahoo (myYahoo) provides to ∗This paper substantially augments and improves the preliminary version that appeared as a poster paper in the Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-99) (Adomavicius and Tuzhilin, 1999)
ADOMAVICIUS AND TUZHILIN customers personalized content, such as local weather or interesting events in the here the customer lives. As another example, Amazon. com and Moviecritic com provide recommendations on what books to read and movies to see respectively. In general, there is a very strong interest in the industry in personalized (one-to-one)marketing applications (Peppers and Rogers, 1993; Allen et al., 1998)and in recommender systems(CACM, 1997, Kautz, 1998; Baudisch, 1999, Soboroff et al., 1999)that provide personal recommendations to individual users for products and services that might be of interest to them. The advantages of these personalized approaches over more traditional segmentation methods are well documented in the literature(Peppers and Rogers, 1993; Personalization Summit, 1999; Allen et al., 1998) One of the key issues in developing such e-commerce applications is the problem of constructing accurate and comprehensive profiles of individual customers that provide the most important information describing who the customers are and how they behave. This problem is so important for building successful e-commerce applications that some authors propose that companies treat customer profiles as key economic assets in addition to more traditional assets such as plant, equipment and human assets(Hagel, 1999; Hagel and Singer, 1999). Although some work on how to construct personal user profiles has been published in the academic literature(and we will review it below), most of the work has been done in the industry so far There are two main approaches to addressing the profiling problem developed by dif- ferent companies. In the first approach, taken by such companies as Engage Technologies Iwww.engage.com]andPersonify[www.personify.com],profilesareconstructedfromthe customers' demographic and transactional data and contain important factual information about the customers. Examples of such factual information include(a) demographic at- tributes, such as age, address, income and a shoe size of a customer, and (b) certain facts extracted from his or her transactional data, such as that the average and maximal purchase amounts of that customer over the last year were $23 and $127 respectively, or that the favorite newspaper of a particular Travelocity customer is the New York Times and her fa vorite vacation destination is Almond Beach Club in Barbados. This factual data comprises the profile of a customer and is typically stored in a relational table According to the other approach, taken by such companies as Art Technology Group Iwww.atg.com]andBroadVision[www.broadvision.com],customerprofilescontainnot only factual information but also rules that describe on-line behavioral activities of the customers. However, these rules are defined by experts(e.g, a marketing manager work- ing on a particular marketing application). For example, a manager may specify that if a customer of a certain type visits the Web site of the on-line groceries shopping com- ShopTillUStop com on Sunday should be shown the discour coupons for diapers. This approach differs from the previous approach in that the profiles contain behavioral rules in addition to the factual information about the customer. however these behavioral rules are not constructed in a truly one-to-one manner since these rules are specified by the expert rather than learned from the data and are applicable only to groups In addition to the developments in the industry, the profiling problem was also studied in the data mining academic community by Fawcett and Provost(1996, 1997), Aggarwal
34 ADOMAVICIUS AND TUZHILIN its customers personalized content, such as local weather or interesting events in the area where the customer lives. As another example, Amazon.com and Moviecritic.com provide recommendations on what books to read and movies to see respectively. In general, there is a very strong interest in the industry in personalized (one-to-one) marketing applications (Peppers and Rogers, 1993; Allen et al., 1998) and in recommender systems (CACM, 1997; Kautz, 1998; Baudisch, 1999; Soboroff et al., 1999) that provide personal recommendations to individual users for products and services that might be of interest to them. The advantages of these personalized approaches over more traditional segmentation methods are well documented in the literature (Peppers and Rogers, 1993; Personalization Summit, 1999; Allen et al., 1998). One of the key issues in developing such e-commerce applications is the problem of constructing accurate and comprehensive profiles of individual customers that provide the most important information describing who the customers are and how they behave. This problem is so important for building successful e-commerce applications that some authors propose that companies treat customer profiles as key economic assets in addition to more traditional assets such as plant, equipment and human assets (Hagel, 1999; Hagel and Singer, 1999). Although some work on how to construct personal user profiles has been published in the academic literature (and we will review it below), most of the work has been done in the industry so far. There are two main approaches to addressing the profiling problem developed by different companies. In the first approach, taken by such companies as Engage Technologies [www.engage.com] and Personify [www.personify.com], profiles are constructed from the customers’ demographic and transactional data and contain important factual information about the customers. Examples of such factual information include (a) demographic attributes, such as age, address, income and a shoe size of a customer, and (b) certain facts extracted from his or her transactional data, such as that the average and maximal purchase amounts of that customer over the last year were $23 and $127 respectively, or that the favorite newspaper of a particular Travelocity customer is the New York Times and her favorite vacation destination is Almond Beach Club in Barbados. This factual data comprises the profile of a customer and is typically stored in a relational table. According to the other approach, taken by such companies as Art Technology Group [www.atg.com] and BroadVision [www.broadvision.com], customer profiles contain not only factual information but also rules that describe on-line behavioral activities of the customers. However, these rules are defined by experts (e.g., a marketing manager working on a particular marketing application). For example, a manager may specify that if a customer of a certain type visits the Web site of the on-line groceries shopping company ShopTillUStop.com on Sunday evenings, that customer should be shown the discount coupons for diapers. This approach differs from the previous approach in that the profiles contain behavioral rules in addition to the factual information about the customer. However, these behavioral rules are not constructed in a truly one-to-one manner since these rules are specified by the expert rather than learned from the data and are applicable only to groups of customers. In addition to the developments in the industry, the profiling problem was also studied in the data mining academic community by Fawcett and Provost (1996, 1997), Aggarwal
RULE-BASED USER MODELS et al.(1998), Adomavicius and Tuzhilin(1999), and Chan (1999). In particular, Fawcett and Provost(1996, 1997)studied this problem within the context of fraud detection in the cellular phone industry. This was done by learning rules pertaining to individual customers from the cellular phone usage data using the rule learning system RL(Clearwater and Provost, 1990). However, these discovered rules were used not for the purpose of understanding the personal behavior of individual customers, but rather to instantiate generalized profilers that are applicable to several customer accounts for the purpose of learning fraud conditions Aggarwal et al. (1998)study the problem of on-line mining of customer profiles specified with association rules, where the body of a rule refers to the demographic information of a user, such as age and salary, and the head of a rule refers to transactional information, such as purchasing characteristics. Moreover, Aggarwal et al. present a multidimensional indexing structure for mining such rules. The proposed method provides a new approach to deriving association rules that segment users based on their transactional characteristics However, it does not derive behavior of an individual user in a one-to-one fashion(Peppers and rogers, 1993) Still another approach to the profiling problem was presented by Chan(1999)in the context of providing personalized Web search. In this approach the user profile consists of a Web Access Graph summarizing Web access patterns by the user, and a Page Interest Estimator characterizing interests of the user in various Web pages. Although the approach presented by Chan goes beyond building simple factual profiles, these profiles are spe- cialized to be used in specific Web-related applications, i.e., to provide personalized Web search. This means that they do not attempt to capture all aspects of the on-line behavior of individual users. One specific consequence of this specialization is that Chan does not use behavioral rules as a part of a user profile In(Adomavicius and Tuzhilin, 1999), we presented an initial approach to the profiling problem that we expand and improve in this paper. In particular, in this paper we present a framework for building behavioral profiles of individual users. These behavioral profiles contain not only factual information about the users, but also capture more comprehen- sive behavioral information using conjunctive rules that are learned from user transactional histories using various data mining methods. However, there are caveats to this approach due to the nature of personalization applications. In particular, as will be explained in he paper, the behavioral rules learned about individual users can be unreliable, irrelevant bvious. Therefore, post-analysis, including rule validation, becomes an important sue for building accurate personalized profiles of users. The second contribution of this paper lies in developing a new approach to validating the discovered rules during the post- analysis stage of the data mining process. This validation process is performed by the domain expert who can iteratively apply various rule validation operators. In particular, we describe different validation operators and demonstrate how these operators are integrated into a unifying framework. Development of specific validation operators, in particular, rule grouping method based on attribute hierarchies, constitutes the third contribution of this paper. Finally, the paper describes a case study of testing the proposed validation method on a marketing application The"quality" of rules stored in user profiles can be defined in several ways. In particular, rules can be" good" because they are(1)statistically valid, (2)acceptable to a human expert
RULE-BASED USER MODELS 35 et al. (1998), Adomavicius and Tuzhilin (1999), and Chan (1999). In particular, Fawcett and Provost (1996, 1997) studied this problem within the context of fraud detection in the cellular phone industry. This was done by learning rules pertaining to individual customers from the cellular phone usage data using the rule learning system RL (Clearwater and Provost, 1990). However, these discovered rules were used not for the purpose of understanding the personal behavior of individual customers, but rather to instantiate generalized profilers that are applicable to several customer accounts for the purpose of learning fraud conditions. Aggarwal et al. (1998) study the problem of on-line mining of customer profiles specified with association rules, where the body of a rule refers to the demographic information of a user, such as age and salary, and the head of a rule refers to transactional information, such as purchasing characteristics. Moreover, Aggarwal et al. present a multidimensional indexing structure for mining such rules. The proposed method provides a new approach to deriving association rules that segment users based on their transactional characteristics. However, it does not derive behavior of an individual user in a one-to-one fashion (Peppers and Rogers, 1993). Still another approach to the profiling problem was presented by Chan (1999) in the context of providing personalized Web search. In this approach the user profile consists of a Web Access Graph summarizing Web access patterns by the user, and a Page Interest Estimator characterizing interests of the user in various Web pages. Although the approach presented by Chan goes beyond building simple factual profiles, these profiles are specialized to be used in specific Web-related applications, i.e., to provide personalized Web search. This means that they do not attempt to capture all aspects of the on-line behavior of individual users. One specific consequence of this specialization is that Chan does not use behavioral rules as a part of a user profile. In (Adomavicius and Tuzhilin, 1999), we presented an initial approach to the profiling problem that we expand and improve in this paper. In particular, in this paper we present a framework for building behavioral profiles of individual users. These behavioral profiles contain not only factual information about the users, but also capture more comprehensive behavioral information using conjunctive rules that are learned from user transactional histories using various data mining methods. However, there are caveats to this approach due to the nature of personalization applications. In particular, as will be explained in the paper, the behavioral rules learned about individual users can be unreliable, irrelevant, or obvious. Therefore, post-analysis, including rule validation, becomes an important issue for building accurate personalized profiles of users. The second contribution of this paper lies in developing a new approach to validating the discovered rules during the postanalysis stage of the data mining process. This validation process is performed by the domain expert who can iteratively apply various rule validation operators. In particular, we describe different validation operators and demonstrate how these operators are integrated into a unifying framework. Development of specific validation operators, in particular, rule grouping method based on attribute hierarchies, constitutes the third contribution of this paper. Finally, the paper describes a case study of testing the proposed validation method on a marketing application. The “quality” of rules stored in user profiles can be defined in several ways. In particular, rules can be “good” because they are (1) statistically valid, (2) acceptable to a human expert
ADOMAVICIUS AND TUZHILIN in a given application, (3)"effective"in the sense that they result in certain benefits obtained an application. In this paper, we focus on the first two aspects, i.e., statistical validity and acceptability to an expert. The third aspect of rule quality is a more complex issue, an we do not address it in this paper, leaving it as a topic for future research The rule validation problem in the post-analysis stage of the data mining process has been addressed before in the data mining community. In particular, there has been work done on ifying filtering constraints that select only certain types of rules from the set of all the discovered rules; examples of this research include(Klemettinen et al., 1994; Liu and Hsu, 1996: Liu et al., 1999). In these approaches the user specifies constraints but does not do it teratively. In contrast to this, it has been observed by several researchers, e.g. Brachman and Anand (1996), Fayyad et al. (1996), Silberschatz and Tuzhilin(1996a), Provost and Jensen(1998), Lee et al. (1998), Adomavicius and Tuzhilin(1999), Sahar(1999), that knowledge discovery should be an iterative process that involves an explicit participation of the domain expert, and we apply this point of view to the rule validation process The rest of the paper is organized as follows. In Section 2, we present our approach to profiles and profile construction. The profile validation process is described in Section 3 and specific validation operators are presented in Section 4. In Section 5 we describe how to do incremental validation. In Section 6 we describe the case study of using our profiling system in a market research application. Finally, we discuss additional issues related to the profile construction problem in Section 7 2. A proposed approach to profiling 1. Defining user profi In order to explain what user profiles are and how they can be constructed, we first focus on the data that is used for constructing these profiles Data model. Various e-commerce personalization applications can contain different types of data about individual users. However, this data can be classified in many applications into two basic types-factual and transactional, where the factual data describes who the user is and the transactional data describes what the user does For example, in a marketing application based on purchasing histories of users, the factual data would be the demographic data of users, such as name, gender, birth date, and salary. The transactional data would consist of records of purchases that the user made over a period of time. a purchase record would include such attributes as the date of purchase, product purchased, product characteristics, amount of money spent, use or no use of a coupon, value of a coupon if used, discount applied, etc Profile model. a profile is a collection of information that describes a user. One of the open issues in the profile construction process is what information should be included in a user profile. In their simplest form, user profiles contain factual information that can be described as a set of individual facts that, for example, can be stored in a record of a relational database table. These facts may include demographic information about the user,such as name, address, date of birth, and gender, that are usually taken from the user
36 ADOMAVICIUS AND TUZHILIN in a given application, (3) “effective” in the sense that they result in certain benefits obtained in an application. In this paper, we focus on the first two aspects, i.e., statistical validity and acceptability to an expert. The third aspect of rule quality is a more complex issue, and we do not address it in this paper, leaving it as a topic for future research. The rule validation problem in the post-analysis stage of the data mining process has been addressed before in the data mining community. In particular, there has been work done on specifying filtering constraints that select only certain types of rules from the set of all the discovered rules; examples of this research include (Klemettinen et al., 1994; Liu and Hsu, 1996; Liu et al., 1999). In these approaches the user specifies constraints but does not do it iteratively. In contrast to this, it has been observed by several researchers, e.g. Brachman and Anand (1996), Fayyad et al. (1996), Silberschatz and Tuzhilin (1996a), Provost and Jensen (1998), Lee et al. (1998), Adomavicius and Tuzhilin (1999), Sahar (1999), that knowledge discovery should be an iterative process that involves an explicit participation of the domain expert, and we apply this point of view to the rule validation process. The rest of the paper is organized as follows. In Section 2, we present our approach to profiles and profile construction. The profile validation process is described in Section 3, and specific validation operators are presented in Section 4. In Section 5 we describe how to do incremental validation. In Section 6 we describe the case study of using our profiling system in a market research application. Finally, we discuss additional issues related to the profile construction problem in Section 7. 2. A proposed approach to profiling 2.1. Defining user profiles In order to explain what user profiles are and how they can be constructed, we first focus on the data that is used for constructing these profiles. Data model. Various e-commerce personalization applications can contain different types of data about individual users. However, this data can be classified in many applications into two basic types—factual and transactional, where the factual data describes who the user is and the transactional data describes what the user does. For example, in a marketing application based on purchasing histories of users, the factual data would be the demographic data of users, such as name, gender, birth date, and salary. The transactional data would consist of records of purchases that the user made over a period of time. A purchase record would include such attributes as the date of purchase, product purchased, product characteristics, amount of money spent, use or no use of a coupon, value of a coupon if used, discount applied, etc. Profile model. A profile is a collection of information that describes a user. One of the open issues in the profile construction process is what information should be included in a user profile. In their simplest form, user profiles contain factual information that can be described as a set of individual facts that, for example, can be stored in a record of a relational database table. These facts may include demographic information about the user, such as name, address, date of birth, and gender, that are usually taken from the user
RULE- BASED USER MODELS description data. The facts can also be derived from the transactional and item description data. Examples of such facts are"the favorite beer of user ALW392 is Heineken", the biggest purchase made by ALW392 was for $237, the favorite movie star of ALw392 is Harrison Ford. The construction of factual profiles is a relatively simple and well- understood problem, and keyword-based factual profiles have been extensively used recommender systems A user profile can also contain a behavioral component that describes behavior of the user learned from his or her transactional history. One way to define user behavior is with a set of conjunctive rules, such as association(Agrawal et al., 1996)or classification rules(Breiman et al., 1984). Examples of rules describing user behavior are:"when user ALW392 comes to the Web site y from site Z, she usually returns back to site Z immediately,"when shopping on the NetGrocercom Web site on weekends, user ALW392 usually spends more than $100 on groceries,"whenever user ALW392 goes on a business trip to Los Angeles, she stays there in expensive hotels. The use of rules profiles provides an intuitive, declarative and modular way to describe user beha and was advocated in( Fawcett and Provost, 1997, Adomavicius and Tuzhilin, 1999) These rules can either be defined by domain experts, as is done in systems developed by Broad Vision and Art Technology Group, or derived from the transactional data of a user using various data mining methods. We describe this derivation process in the next 2. Profile construction Since we focus on personalization applications, rule discovery methods should be applied individually to the transactional data of every user, thus, capturing truly personal behavior of each user Such rules can be discovered using various data mining algorithms. For example, to discover association rules, we can use Apriori(Agrawal et al., 1996)and its numerous variations. Similarly, to discover classification rules, we can use CART (Breiman et al to point out that our approach is not restricted to any specific representation of data mining rules and their discovery methods One of the serious problems with many rule discovery methods is that they tend to gen- erate large numbers of patterns, and often many of them, while being statistically accept- able, are trivial, spurious, or just not relevant to the application at hand ( Piatetsky-Shapiro and Matheus, 1994, Silberschatz and Tuzhilin, 1996b; Liu and Hsu, 1996; Brin et al 997, Stedman, 1997, Padmanabhan and Tuzhilin, 1998, 1999). Therefore, post-analysis of discovered rules becomes an important issue, since there is a need to validate the discov ered rules. For example, assume that a data mining method discovered the rule stating that, whenever customer ALW392 goes on a business trip to Los Angeles, she mostly stays in expensive hotels there. In particular, assume that ALW392 went to Los Angeles 7 times over the past 2 years and 5 out of 7 times stayed in expensive hotels. Before this rule can be placed into ALW392's profile, it needs to be validated, since it may not be immediately clear whether this rule really captures the behavior of ALW392, or whether it constitutes
RULE-BASED USER MODELS 37 description data. The facts can also be derived from the transactional and item description data. Examples of such facts are “the favorite beer of user ALW392 is Heineken”, the biggest purchase made by ALW392 was for $237”, “the favorite movie star of ALW392 is Harrison Ford.” The construction of factual profiles is a relatively simple and wellunderstood problem, and keyword-based factual profiles have been extensively used in recommender systems. A user profile can also contain a behavioral component that describes behavior of the user learned from his or her transactional history. One way to define user behavior is with a set of conjunctive rules, such as association (Agrawal et al., 1996) or classification rules (Breiman et al., 1984). Examples of rules describing user behavior are: “when user ALW392 comes to the Web site Y from site Z, she usually returns back to site Z immediately”, “when shopping on the NetGrocer.com Web site on weekends, user ALW392 usually spends more than $100 on groceries”, “whenever user ALW392 goes on a business trip to Los Angeles, she stays there in expensive hotels.” The use of rules in profiles provides an intuitive, declarative and modular way to describe user behavior and was advocated in (Fawcett and Provost, 1997; Adomavicius and Tuzhilin, 1999). These rules can either be defined by domain experts, as is done in systems developed by BroadVision and Art Technology Group, or derived from the transactional data of a user using various data mining methods. We describe this derivation process in the next section. 2.2. Profile construction Since we focus on personalization applications, rule discovery methods should be applied individually to the transactional data of every user, thus, capturing truly personal behavior of each user. Such rules can be discovered using various data mining algorithms. For example, to discover association rules, we can use Apriori (Agrawal et al., 1996) and its numerous variations. Similarly, to discover classification rules, we can use CART (Breiman et al., 1984), C4.5 (Quinlan, 1993), or other classification rule discovery methods. We would like to point out that our approach is not restricted to any specific representation of data mining rules and their discovery methods. One of the serious problems with many rule discovery methods is that they tend to generate large numbers of patterns, and often many of them, while being statistically acceptable, are trivial, spurious, or just not relevant to the application at hand (Piatetsky-Shapiro and Matheus, 1994; Silberschatz and Tuzhilin, 1996b; Liu and Hsu, 1996; Brin et al., 1997; Stedman, 1997; Padmanabhan and Tuzhilin, 1998, 1999). Therefore, post-analysis of discovered rules becomes an important issue, since there is a need to validate the discovered rules. For example, assume that a data mining method discovered the rule stating that, whenever customer ALW392 goes on a business trip to Los Angeles, she mostly stays in expensive hotels there. In particular, assume that ALW392 went to Los Angeles 7 times over the past 2 years and 5 out of 7 times stayed in expensive hotels. Before this rule can be placed into ALW392’s profile, it needs to be validated, since it may not be immediately clear whether this rule really captures the behavior of ALW392, or whether it constitutes a
ADOMAVICIUS AND TUZHILIN spurious correlation or is simply not relevant to the application at hand. In the next section we present methods for validating behavioral rules in user profiles 3. Validation of user profiles A common way to perform the post-analysis of data mining results is to let the domain expert perform this task, and several data mining systems support this capability. For Mine Set(Brunk et al., 1997) provides a wide range of visual ization techniques the end-user to examine visually the results discovered by its data mining tools and thus evaluate the quality of these results In our approach, individual rules discovered during the data mining stage are validated by the expert, and, depending on how well they represent the actual behaviors of the users, some rules are"accepted"and some"rejected"by the expert. Then the accepted rules form the behavioral profiles of users One of the main issues about validating individual rules of users by a human expert is salability. In many e-commerce personalization applications the number of users tends to be very large. For example, the number of registered users at major Web sites is measured in millions. If we discover a hundred rules per customer on average, then the total number of rules for such sites would be measured in hundreds of millions Therefore. it would be impossible for a human expert to validate all the discovered rules on a one-by-one basis in uch applications We address this problem by providing a framework allowing the human expert validate large numbers of rules(instead of individual rules )at a time with relatively little input from the expert. This is done by applying different rule validation operators that are described in Section 4. Then rule validation becomes an iterative process and is described in figure I In particular, the profile building activity is divided into two phases. In Phase I, the data mining phase, rules describing behaviors of individual users are generated from the users transactional data as was described in Section 2.2 Phase II constitutes the rule validation process. Rule validation, unlike rule discovery (Phase D), is not performed separately for each user, but rather for all users at once. The Validation operator t Phase l1 Figure 1. The profile building process
38 ADOMAVICIUS AND TUZHILIN spurious correlation or is simply not relevant to the application at hand. In the next section we present methods for validating behavioral rules in user profiles. 3. Validation of user profiles A common way to perform the post-analysis of data mining results is to let the domain expert perform this task, and several data mining systems support this capability. For example, MineSet (Brunk et al., 1997) provides a wide range of visualization techniques allowing the end-user to examine visually the results discovered by its data mining tools and thus evaluate the quality of these results. In our approach, individual rules discovered during the data mining stage are validated by the expert, and, depending on how well they represent the actual behaviors of the users, some rules are “accepted” and some “rejected” by the expert. Then the accepted rules form the behavioral profiles of users. One of the main issues about validating individual rules of users by a human expert is scalability. In many e-commerce personalization applications the number of users tends to be very large. For example, the number of registered users at major Web sites is measured in millions. If we discover a hundred rules per customer on average, then the total number of rules for such sites would be measured in hundreds of millions. Therefore, it would be impossible for a human expert to validate all the discovered rules on a one-by-one basis in such applications. We address this problem by providing a framework allowing the human expert validate large numbers of rules (instead of individual rules) at a time with relatively little input from the expert. This is done by applying different rule validation operators that are described in Section 4. Then rule validation becomes an iterative process and is described in figure 1. In particular, the profile building activity is divided into two phases. In Phase I, the data mining phase, rules describing behaviors of individual users are generated from the users’ transactional data as was described in Section 2.2. Phase II constitutes the rule validation process. Rule validation, unlike rule discovery (Phase I), is not performed separately for each user, but rather for all users at once. The Figure 1. The profile building process
RULE-BASED USER MODELS reason we propose performing rule validation collectively (rather than individually)for all users is that there are usually many similar or even identical rules across different users For example, the rule" when on the Netgrocer com Web sil 1.sel ALl392 usually spends more than $100 on groceries "can be common to many users. I addition, although rules"when user ALW392 comes to our Web site from site Y, she usually returns back to site y immediately, and"when user KTL158 comes to our Web site from site Z, she usually returns back to site Z immediately, are not identical, they are quite"similar and can be examined by the expert together. The collective rule validation allows one to deal with such common rules once, thus significantly reducing validation effort. Therefore in the beginning of Phase Il, rules from all the users are collected into one set. Each rule is tagged with the Id of the user to which it belongs, so that each accepted rule could be put into the profile of that user at the end of the validation phase After rules from all users are collected into one set, the rule validation process is performed as a second part of Phase Il. This process is described in figure 2. All rules discovered during Phase I(denoted by Rall in figure 2)are considered unvalidated. The human expert selects various validation operators and applies them successively to the set of unvalidated rules The application of each validation operator results in validation of some of the rules. In particular, some rules get accepted and some rejected(sets Oacc and Ore in figure 2).Then he next validation operator would be applied to the set of the remaining unvalidated rules (set Rum ). This validation process stops when the Terminate ValidationProcess condition is met. This condition is set by the human expert and is discussed later in this sectio After the validation process is stopped, the set of all the discovered rules (rall)is split into three disjoint sets: accepted rules(racc), rejected rules(rrej), and possibly some remaining unvalidated rules(Rum ). At the end of Phase Il all the accepted rules are put into the behavioral profiles of their respective users. This is possible, because all the rules have been tagged with the user id in the beginning of phase ll as described above As was already stated above and shown in figure 2, various validation operators are successively applied to the set of the unvalidated rules until the stopping criterion Terminate dationProcess is reached. The stopping criterion can be specified by the expert and may include such conditions as(a)only few rules remain unvalidated, (b)only few rules are being validated at a time by one or several validation operators, and(c)the total elapsed validation time exceeds the predetermined validation time mput: Set of all discovered rules Ral Output: Mutually disjoint sets of rules Race, Rref, Rumm such that Rall= Roce U Rre U rum 1)Rwr: Rall, Race: =o, Rref: =w (2) while(not Terminate Validation Processo)begin Expert selects a validation operator (say, O) from the set of perators (4) O is applied to Rwr. Result: disjoint sets Oace and Or Figure 2. An algorithm for the rule validation process
RULE-BASED USER MODELS 39 reason we propose performing rule validation collectively (rather than individually) for all users is that there are usually many similar or even identical rules across different users. For example, the rule “when shopping on the NetGrocer.com Web site on weekends, user ALW392 usually spends more than $100 on groceries” can be common to many users. In addition, although rules “when user ALW392 comes to our Web site from site Y, she usually returns back to site Y immediately,” and “when user KTL158 comes to our Web site from site Z, she usually returns back to site Z immediately,” are not identical, they are quite “similar” and can be examined by the expert together. The collective rule validation allows one to deal with such common rules once, thus significantly reducing validation effort. Therefore, in the beginning of Phase II, rules from all the users are collected into one set. Each rule is tagged with the ID of the user to which it belongs, so that each accepted rule could be put into the profile of that user at the end of the validation phase. After rules from all users are collected into one set, the rule validation process is performed as a second part of Phase II. This process is described in figure 2. All rules discovered during Phase I (denoted by Rall in figure 2) are considered unvalidated. The human expert selects various validation operators and applies them successively to the set of unvalidated rules. The application of each validation operator results in validation of some of the rules. In particular, some rules get accepted and some rejected (sets Oacc and Orej in figure 2). Then the next validation operator would be applied to the set of the remaining unvalidated rules (set Runv). This validation process stops when the Terminate ValidationProcess condition is met. This condition is set by the human expert and is discussed later in this section. After the validation process is stopped, the set of all the discovered rules (Rall) is split into three disjoint sets: accepted rules (Racc), rejected rules (Rrej), and possibly some remaining unvalidated rules (Runv). At the end of Phase II all the accepted rules are put into the behavioral profiles of their respective users. This is possible, because all the rules have been tagged with the user ID in the beginning of Phase II as described above. As was already stated above and shown in figure 2, various validation operators are successively applied to the set of the unvalidated rules until the stopping criterion Terminate ValidationProcessis reached. The stopping criterion can be specified by the expert and may include such conditions as (a) only few rules remain unvalidated, (b) only few rules are being validated at a time by one or several validation operators, and (c) the total elapsed validation time exceeds the predetermined validation time. Input: Set of all discovered rules Rall. Output: Mutually disjoint sets of rules Racc, Rrej, Runv, such that Rall = Racc ∪ Rrej ∪ Runv. (1) Runv := Rall, Racc := ∅, Rrej := ∅. (2) while (not TerminateValidationProcess()) begin (3) Expert selects a validation operator (say, O) from the set of available validation operators. (4) O is applied to Runv. Result: disjoint sets Oacc and Orej. (5) Runv := Runv − Oacc − Orej, Racc := Racc ∪ Oacc, Rrej := Rrej ∪ Orej. (6) end Figure 2. An algorithm for the rule validation process
ADOMAVICIUS AND TUZHILIN In this section we described the overall validation process. We present the detailed des- cription of various specific validation operators in the next section 4. Validation operators As stated in Section 3, validation operators provide a way for the domain expert to examine multiple rules at a time. This examination process can be performed in the following two ways. First, the expert may already know some types of rules that he or she wants to amine and accept or reject based on the prior experience. Therefore, it is important to provide capabilities allowing him or her to specify such types of rules in advance. In this section,we present template- and interestingness-based filtering operators that serve this purpose. Second, the expert may not know all the relevant types of rules in advance, and it is important to provide methods that group discovered rules into classes that he or she an subsequently examine and validate. In this section we also present the similarity-based ule grouping operator that serves this purpose. In addition, we describe other operators that can be used in the validation process, including visualization, statistical analysis, and browsing op Although our validation methods are general and can be applied to several forms of conjunctive rules, we focus mainly on association rules with discrete values in this paper 4. 1. Similarity-based rule grouping As pointed out in Section 3, there can be many"similar"rules among all the discovered rules, and it would be useful for the domain expert to evaluate all these similar rules together rather than individually. In order to do this, some similarity measure that would allow grouping similar rules together needs to be specified In this paper, we propose a method to specify such a similarity measure using attribute hierarchies. An attribute hierarchy is organized as a tree by the human expert in the beginning of the validation process. The leaves of the tree consist of all the attributes of the data set to which rule discovery methods were applied, i.e, all the attributes that can potentially be present in the discovered rules. The non-leaf nodes in the tree are specified by the human expert and are obtained by combining several lower-level nodes into one parent node. For instance, figure 3 presents an example of such a hierarchy, where nodes Al and A2 are combined into node a6 and nodes a3. a4 and a5 into node a7. and then nodes a6 Al A2 A3 A4 As Al A2 A3 A4 As Al 42/A3 A Al A2 A3 A4 A5 Figure 3. An example of an attribute hierarchy for similarity-based grouping
40 ADOMAVICIUS AND TUZHILIN In this section we described the overall validation process. We present the detailed description of various specific validation operators in the next section. 4. Validation operators As stated in Section 3, validation operators provide a way for the domain expert to examine multiple rules at a time. This examination process can be performed in the following two ways. First, the expert may already know some types of rules that he or she wants to examine and accept or reject based on the prior experience. Therefore, it is important to provide capabilities allowing him or her to specify such types of rules in advance. In this section, we present template- and interestingness-based filtering operators that serve this purpose. Second, the expert may not know all the relevant types of rules in advance, and it is important to provide methods that group discovered rules into classes that he or she can subsequently examine and validate. In this section we also present the similarity-based rule grouping operator that serves this purpose. In addition, we describe other operators that can be used in the validation process, including visualization, statistical analysis, and browsing operators. Although our validation methods are general and can be applied to several forms of conjunctive rules, we focus mainly on association rules with discrete values in this paper. 4.1. Similarity-based rule grouping As pointed out in Section 3, there can be many “similar” rules among all the discovered rules, and it would be useful for the domain expert to evaluate all these similar rules together rather than individually. In order to do this, some similarity measure that would allow grouping similar rules together needs to be specified. In this paper, we propose a method to specify such a similarity measure using attribute hierarchies. An attribute hierarchy is organized as a tree by the human expert in the beginning of the validation process.1 The leaves of the tree consist of all the attributes of the data set to which rule discovery methods were applied, i.e., all the attributes that can potentially be present in the discovered rules. The non-leaf nodes in the tree are specified by the human expert and are obtained by combining several lower-level nodes into one parent node. For instance, figure 3 presents an example of such a hierarchy, where nodes A1 and A2 are combined into node A6 and nodes A3, A4 and A5 into node A7, and then nodes A6 Figure 3. An example of an attribute hierarchy for similarity-based grouping
RULE-BASED USER MODELS and A7 are combined into node A8. Another example of an attribute hierarchy is presented in figure 9. We call non-leaf nodes of an attribute hierarchy aggregated attributes The attribute hierarchy is used for determining similar rules and grouping them together More specifically, the semantics of the similarity-based grouping operator is defined as follows 1. Specifying rule aggregation level. Rules are grouped by specifying the level of rule aggregation in the attribute hierarchy which is provided by the human expert. Such a specification is called a cut, and it forms a subset of all the nodes of the tree(leaf and non-leaf), such that for every path from a leaf node to the root, exactly one node on such path belongs to this subset. Therefore, given a cut, every leaf node has its corresponding cut node. Given a cut C, we define for any leaf node Xi its corresponding cut node cutc(i) as follows cutc(Xi= cutc(parent(Xi)), otherwise Figure 3 presents several different cuts of an attribute hierarchy that are represented by shaded regions. For example, for the cut from figure 3(c), cut3c(A2)= A2 and cut3c(A3)= A7. Moreover, the cut node of any leaf node can be calculated in constant time by implementing a straightforward lookup table for that cut 2. Aggregating rules. Given a cut C, a rule x1A…∧k→Xk+1A…^ X, is aggregated by performing the following syntactic transformation cltC(1A…∧X→X+1A…A cutc(1)A…Actc(Xk)→culc(Xk+1)A… A cuIc(Y where cutc(Xi) maps each leaf node of the attribute hierarchy into its corresponding cut node as described in Step l above. The resulting rule is called an aggregated rule Since several different leaf nodes can have the same cut node, sometimes after aggre gating a rule we can get multiple instances of the same aggregated attribute in the body or in the head of the rule. In this case we simply eliminate those extra instances of an attribute. Consider, for example, the rule A2AA3A A4= A5. By applying cut(c)from fgue3 to this rule, we will get the aggregated rule a2∧47∧7→A7, and by re- moving duplicate terms A7 in the body of the rule we finally get A2 A A7= A7.2 Given a cut, the computational complexity of a single rule aggregation is linearly proportional to the size of the rule (i.e, total number of attributes in the rule), as will be 3. Grouping rules. Given a cut C, we can group a set of rules S into groups by applying C to every rule in S as described in Step 2 above. When a cut is applied to a set of rules different rules can be mapped into the same aggregated rule. For example, consider rules a2∧A3∧A4→A5andA2∧A5→A3. After applying cut(c) from figure3to both of them, they are mapped into the same rule A2 A A7= A7. More generally, we can group a set of rules based on the cut C as follows. Two rules R, and R2 belong to
RULE-BASED USER MODELS 41 and A7 are combined into node A8. Another example of an attribute hierarchy is presented in figure 9. We call non-leaf nodes of an attribute hierarchy aggregated attributes. The attribute hierarchy is used for determining similar rules and grouping them together. More specifically, the semantics of the similarity-based grouping operator is defined as follows. 1. Specifying rule aggregation level. Rules are grouped by specifying the level of rule aggregation in the attribute hierarchy which is provided by the human expert. Such a specification is called a cut, and it forms a subset of all the nodes of the tree (leaf and non-leaf), such that for every path from a leaf node to the root, exactly one node on such path belongs to this subset. Therefore, given a cut, every leaf node has its corresponding cut node. Given a cut C, we define for any leaf node Xi its corresponding cut node cutC(Xi) as follows: cutC(Xi) = ( Xi, if Xi ∈ C cutC(parent(Xi)), otherwise Figure 3 presents several different cuts of an attribute hierarchy that are represented by shaded regions. For example, for the cut from figure 3(c), cut3c(A2) = A2 and cut3c(A3) = A7. Moreover, the cut node of any leaf node can be calculated in constant time by implementing a straightforward lookup table for that cut. 2. Aggregating rules. Given a cut C, a rule X1 ∧···∧ Xk ⇒ Xk+1 ∧···∧ Xl is aggregated by performing the following syntactic transformation: cutC(X1 ∧···∧ Xk ⇒ Xk + 1 ∧···∧ Xl) = cutC(X1) ∧···∧ cutC(Xk) ⇒ cutC(Xk + 1) ∧···∧ cutC(Xl) where cutC(Xi) maps each leaf node of the attribute hierarchy into its corresponding cut node as described in Step 1 above. The resulting rule is called an aggregated rule. Since several different leaf nodes can have the same cut node, sometimes after aggregating a rule we can get multiple instances of the same aggregated attribute in the body or in the head of the rule. In this case we simply eliminate those extra instances of an attribute. Consider, for example, the rule A2∧ A3∧ A4 ⇒ A5. By applying cut (c) from figure 3 to this rule, we will get the aggregated rule A2 ∧ A7 ∧ A7 ⇒ A7, and by removing duplicate terms A7 in the body of the rule we finally get A2 ∧ A7 ⇒ A7.2 Given a cut, the computational complexity of a single rule aggregation is linearly proportional to the size of the rule (i.e., total number of attributes in the rule), as will be described later. 3. Grouping rules. Given a cut C, we can group a set of rules S into groups by applying C to every rule in S as described in Step 2 above. When a cut is applied to a set of rules, different rules can be mapped into the same aggregated rule. For example, consider rules A2 ∧ A3 ∧ A4 ⇒ A5 and A2 ∧ A5 ⇒ A3. After applying cut (c) from figure 3 to both of them, they are mapped into the same rule A2 ∧ A7 ⇒ A7. More generally, we can group a set of rules based on the cut C as follows. Two rules R1 and R2 belong to
ADOMAVICIUS AND TUZHILIN nitial Rule groups obtained from rule set s using cuts rule set s cut 3(b) cut 3(c) cut 3(d) A1→A3 A6→A3 (3)47→A7 (2)A6→A7 A3∧A6→A5(2)42AA7→A1(2)46A7→47(3) A1AA2AA3→A5143→45 (1)A2A47→7(2)46A47→A6(2) A2∧A3→A4 A3∧A5→4(1)A1→A7 (1)47→A7 A2∧A3→A5 A3∧A6→4(1)A2→A 44∧A6→A6(1)A1AA2→A7 A2∧44→A1 A5AA6→A6 A1∧A2A47→A7(1) A2∧A5→A1 A3∧A5→ Figure 4. Grouping a set of rules using several different cuts from figure 3(the number of rules in groups is specified in parenthes the same group if and only if cutc(R1)= cutc(R2). Naturally, two different aggregated rules represent two disjoint groups of rules. As an example, figure 4 presents the results of grouping a set of rules based on the attribute hierarchy and several different cuts shown in figure 3 The grouping operator described above allows the user to group rules into sets of similar rules, where similarity is defined by the expert who selects a specific cut of the attribute hierarchy. Moreover, instead of examining and validating individual rules inside each group the user can examine the group of these rules as a whole based on the aggregated rule( that is common for all the rules in the group) and decide whether to accept or reject all the rules in that group at once based on this aggregated rule So far, we assumed that the leaves in the attribute hierarchies are specified by the attributes of the data set. however we also consider the case when attribute hierarchies include values and aggregated values of attributes from the data set. For example, assume that a data set has attribute Month. Then figure 5 presents an attribute hierarchy with 12 values as the leaves representing specific months of the year that are grouped together into four aggregated inter, spring, summer, and fall StoreStore Manul Attributes YES NO YES NO YES NO 1212345678910 Figure 5. A fragment of attribute hierarchy which includes attribute values
42 ADOMAVICIUS AND TUZHILIN Initial Rule groups obtained from rule set S using cuts: rule set S cut 3(b) cut 3(c) cut 3(d) A1 ⇒ A3 A6 ⇒ A3 (3) A7 ⇒ A7 (2) A6 ⇒ A7 (3) A1 ∧ A2 ⇒ A3 A3 ∧ A6 ⇒ A5 (2) A2 ∧ A7 ⇒ A1 (2) A6 ∧ A7 ⇒ A7 (3) A1 ∧ A2 ∧ A3 ⇒ A5 A3 ⇒ A5 (1) A2 ∧ A7 ⇒ A7 (2) A6 ∧ A7 ⇒ A6 (2) A2 ∧ A3 ⇒ A4 A3 ∧ A5 ⇒ A4 (1) A1 ⇒ A7 (1) A7 ⇒ A7 (2) A2 ∧ A3 ⇒ A5 A3 ∧ A6 ⇒ A4 (1) A2 ⇒ A7 (1) A2 ⇒ A3 A4 ∧ A6 ⇒ A6 (1) A1 ∧ A2 ⇒ A7 (1) A2 ∧ A4 ⇒ A1 A5 ∧ A6 ⇒ A6 (1) A1 ∧ A2 ∧ A7 ⇒ A7 (1) A3 ⇒ A5 A2 ∧ A5 ⇒ A1 A3 ∧ A5 ⇒ A4 Figure 4. Grouping a set of rules using several different cuts from figure 3 (the number of rules in groups is specified in parentheses). the same group if and only if cutC(R1) = cutC(R2). Naturally, two different aggregated rules represent two disjoint groups of rules. As an example, figure 4 presents the results of grouping a set of rules based on the attribute hierarchy and several different cuts shown in figure 3. The grouping operator described above allows the user to group rules into sets of similar rules, where similarity is defined by the expert who selects a specific cut of the attribute hierarchy. Moreover, instead of examining and validating individual rules inside each group, the user can examine the group of these rules as a whole based on the aggregated rule (that is common for all the rules in the group) and decide whether to accept or reject all the rules in that group at once based on this aggregated rule. So far, we assumed that the leaves in the attribute hierarchies are specified by the attributes of the data set. However, we also consider the case when attribute hierarchies include values and aggregated values of attributes from the data set. For example, assume that a data set has attribute Month. Then figure 5 presents an attribute hierarchy with 12 values as the leaves representing specific months of the year that are grouped together into four aggregated values: winter, spring, summer, and fall. Figure 5. A fragment of attribute hierarchy which includes attribute values