正在加载图片...
Y Jiang et al. Decision Support Systems 48(2010)470-479 The content-based filtering(CBF)method applies content analysis to the classification results of distinct rules. CSMC is useful rget items. Target items are described by their attributes, such as color, multiple rules-especially conflicting ones-to multi-class classification. shape, and material. The users profile is constructed by analyzing his/her However, it has deficiencies as well. That is, even though CSMc retains ponses to questionnaires, his/her rating of products, and navigation multi-class information in the process of rule acquisition, much of it is story. The recommendation system proposes items that have high lost in rule pruning. Consequently, the evidence bodies used in therating correlations with a users profile. However, a pure CBF system also has its classification are often inaccurate and classification results suffer. shortcomings. One is that users can only receive recommendations similar Moreover, CSMC employs a rough set method to derive evidence to their earlier experiences. The other is that some items, such as music, weights. Although using evidence weights may significantly improve photographs, and multimedia, are hard to analyze [10]. Based on CF and classification accuracy. its computation is very time-consuming and CBE, new data mining techniques employing decision tree, association negatively affects the efficiency of classification. ule, regression model, and Markov chain have been introduced to In this research, we propose a new algorithm to address the rating recommend movies and books [ 2], support one-to-one online market- classification problem. Compared with CSMC, the proposed algorith Unlike those recommending products based on likelihood of pur- and it can derive attribute weights much more efficiently. Detaileng ing 21, and attract customers for the tourism industry [20 can preserve most of the useful multi-class information after pru chase, Bodapati 3 argued that the recommendation decision should our algorithm are discussed next. also examine a customer's sensitivity to such a recommendation. He built a model to measure the role of recommendation systems in 3. The proposed methodology modifying customers' purchase behavior relative to what the custom- ers would have done without such recommendation interventions. 3. 1. The solution framework Although the extant recommendation systems may recommend ac ceptable products to customers, they share a common view that the To construct an efficient and powerful rating classifier for a specific act of purchase itself equates to the customers'satisfaction, which product, we follow three phases as outlined in Fig. 2. The first phase is could be far from the truth, as evidenced by James example earlier. to mine need-rating rules, where retaining multi-class information is the main task. Since conflicts and ambiguities are often present in the 2. 2. Associative classification and the combination strategy for multi- need-rating database, the rules discovered must be able to deal with contradictory facts and uncertainties, and to arrive at multi-class in formation. The second phase calculates the weight for each rule. A Classification is an important management task. Many methods such weight measures the importance of a need-rating rule in the rating as the agent-based approach 34, decision tree 30, and data classification problem. In this phase, computational efficiency is crucial, envelopment analysis(developed by professor Cooper[14))have been due to the presence of various rules useful for classifier construction. proposed to solve the decision analysis problems in various fields As- The last phase is to develop the rating classifier, whose goal is to re- sociative classification is a relatively new classification method whose commend products that yield high customer satisfaction. In this phase, aim is to apply the Apriori algorithm 1 to mine association rules and all factors having influences on the potential customers'satisfaction onstruct associative classifiers [26]. Rule mining will find the asso- levels are considered. Since customers of the same need may voice very ations between attributes(rule preconditions)and ratings (results). In different opinions for the same product, it is important to make multi associative classification, the support degree is defined as the ratio of the class ratings available. Predicting after-use ratings along with corres- number of objects satisfying a specific rule precondition and having a ponding likelihoods provides potential customers a valuable purchase pecific rating result over the total number of objects in the database. guideline, which significantly enhances the odds of customer satisfaction. The confidence degree is similar to the support degree except that the Although traditional associative classification methods could generate number of all objects satisfying the specific rule precondition is used as reasonable classification results, they suffer from three weak points when the denominator. The discovered rules are pruned to attain a minimal classifying customers' ratings. First, they do not accommodate conflicting rule set necessary to cover training data and achieve sufficient ratings. Customers of the same needs and preferences may have very accuracy [37]. Although associative classification methods may derive differentopinions(ratings)for the same product. As discussed in Section 3, more accurate classification results than other methods, they have a traditional associative classification methods resolve the situation by few drawbacks 27. First is related to multi-class classification: The retaining only therule that has the highest confidence level As a result, not associative classification methods available today do not have enough enough multi-class information is preserved to deal with the conflict multi-class information to build multi-class classifiers because all nature. The second weakness is the prediction tactic To predict accurately conflicting rules are removed 31. For example, P1-Cl and P1- C2 are a classifier needs to consider a customers needs, preferences, and two conflicting rules having the same precondition Pi but different demographics. However, traditional prediction relies only on one optimal classifications, c and C2, with confidence degrees of 51% and 49%, rule, which is not enough to attain accurate classification. Thirdly, respectively. Traditional associative classification methods will delete traditional methods classify each case without explanations, that is, they ule P1- C2 because its confidence level is lower than that of P1-Cr. simply list the applicable rule. Such classification is somewhat arbitrary. When conflicts occur, only the rule with the highest confidence level is ambiguous, and irrational. Since customers of the same need may give retained; the competing rules having lower probabilities are all different ratings to the same product, a classification method must removed. Another flaw is that they cannot easily identify an optimal capable of predicting the probabilities of attaining different customer maximizes the measure, such as the support degree, confidence deficiency. Details are describ algorithm aims to overcome the above rule when classifying a new case [24. An optimal rule is the one that ratings. The proposed classificatio degree, or interesting degree as defined by the user. The difficulty with the traditional methods is that different measures may result in 3. 2. Mine need-rating rules different optimal rules. To overcome the above weakness, Liu and his colleagues [27 have The need-rating data is first organized into a data table I=(0, AuC proposed a combination strategy for multi-class classification(CSMC). where O is the set of objects(customers), o is the number of customers in CSMC retains most of the conflicting rules and employs multiple asso- L. A is the set of attributes, A=(Al. Ah AyI. Al is the number of ciation rules to construct classifiers for new cases. After acquiring attributes in A, each Ah. h=1, 2,A, is a factor/criterion or a customer conflicting rules and calculating their weights, the evidential reasoning characteristic. C corresponds to a set of class labels, C=(Cr. cg-.. ga. q approach proposed by Yang and colleagues[40 is employed to combine is the number of dasses in l, each cg,g=1, 2,C denotes a rating grade.The content-based filtering (CBF) method applies content analysis to target items. Target items are described by their attributes, such as color, shape, and material. The user's profile is constructed by analyzing his/her responses to questionnaires, his/her rating of products, and navigation history. The recommendation system proposes items that have high correlations with a user's profile. However, a pure CBF system also has its shortcomings. One is that users can only receive recommendations similar to their earlier experiences. The other is that some items, such as music, photographs, and multimedia, are hard to analyze [10]. Based on CF and CBF, new data mining techniques employing decision tree, association rule, regression model, and Markov chain have been introduced to recommend movies and books [2], support one-to-one online market￾ing [21], and attract customers for the tourism industry [20]. Unlike those recommending products based on likelihood of pur￾chase, Bodapati [3] argued that the recommendation decision should also examine a customer's sensitivity to such a recommendation. He built a model to measure the role of recommendation systems in modifying customers' purchase behavior relative to what the custom￾ers would have done without such recommendation interventions. Although the extant recommendation systems may recommend ac￾ceptable products to customers, they share a common view that the act of purchase itself equates to the customers' satisfaction, which could be far from the truth, as evidenced by James' example earlier. 2.2. Associative classification and the combination strategy for multi￾class classification Classification is an important management task. Many methods such as the agent-based approach [34], decision tree [30], and data envelopment analysis (developed by professor Cooper [14]) have been proposed to solve the decision analysis problems in various fields. As￾sociative classification is a relatively new classification method whose aim is to apply the Apriori algorithm [1] to mine association rules and construct associative classifiers [26]. Rule mining will find the asso￾ciations between attributes (rule preconditions) and ratings (results). In associative classification, the support degree is defined as the ratio of the number of objects satisfying a specific rule precondition and having a specific rating result over the total number of objects in the database. The confidence degree is similar to the support degree except that the number of all objects satisfying the specific rule precondition is used as the denominator. The discovered rules are pruned to attain a minimal rule set necessary to cover training data and achieve sufficient accuracy [37]. Although associative classification methods may derive more accurate classification results than other methods, they have a few drawbacks [27]. First is related to multi-class classification: The associative classification methods available today do not have enough multi-class information to build multi-class classifiers because all conflicting rules are removed [31]. For example, P1→c1 and P1→c2 are two conflicting rules having the same precondition P1 but different classifications, c1 and c2, with confidence degrees of 51% and 49%, respectively. Traditional associative classification methods will delete rule P1→c2 because its confidence level is lower than that of P1→c1. When conflicts occur, only the rule with the highest confidence level is retained; the competing rules having lower probabilities are all removed. Another flaw is that they cannot easily identify an optimal rule when classifying a new case [24]. An optimal rule is the one that maximizes the measure, such as the support degree, confidence degree, or interesting degree as defined by the user. The difficulty with the traditional methods is that different measures may result in different optimal rules. To overcome the above weakness, Liu and his colleagues [27] have proposed a combination strategy for multi-class classification (CSMC). CSMC retains most of the conflicting rules and employs multiple asso￾ciation rules to construct classifiers for new cases. After acquiring conflicting rules and calculating their weights, the evidential reasoning approach proposed by Yang and colleagues [40]is employed to combine the classification results of distinct rules. CSMC is useful since it applies multiple rules—especially conflicting ones—to multi-class classification. However, it has deficiencies as well. That is, even though CSMC retains multi-class information in the process of rule acquisition, much of it is lost in rule pruning. Consequently, the evidence bodies used in the rating classification are often inaccurate and classification results suffer. Moreover, CSMC employs a rough set method to derive evidence weights. Although using evidence weights may significantly improve classification accuracy, its computation is very time-consuming and negatively affects the efficiency of classification. In this research, we propose a new algorithm to address the rating classification problem. Compared with CSMC, the proposed algorithm can preserve most of the useful multi-class information after pruning and it can derive attribute weights much more efficiently. Details of our algorithm are discussed next. 3. The proposed methodology 3.1. The solution framework To construct an efficient and powerful rating classifier for a specific product, we follow three phases as outlined in Fig. 2. The first phase is to mine need-rating rules, where retaining multi-class information is the main task. Since conflicts and ambiguities are often present in the need-rating database, the rules discovered must be able to deal with contradictory facts and uncertainties, and to arrive at multi-class in￾formation. The second phase calculates the weight for each rule. A weight measures the importance of a need-rating rule in the rating classification problem. In this phase, computational efficiency is crucial, due to the presence of various rules useful for classifier construction. The last phase is to develop the rating classifier, whose goal is to re￾commend products that yield high customer satisfaction. In this phase, all factors having influences on the potential customers' satisfaction levels are considered. Since customers of the same need may voice very different opinions for the same product, it is important to make multi￾class ratings available. Predicting after-use ratings along with corres￾ponding likelihoods provides potential customers a valuable purchase guideline, which significantly enhances the odds of customer satisfaction. Although traditional associative classification methods could generate reasonable classification results, they suffer from three weak points when classifying customers' ratings. First, they do not accommodate conflicting ratings. Customers of the same needs and preferences may have very different opinions (ratings) for the same product. As discussedin Section 3, traditional associative classification methods resolve the situation by retaining only the rule that has the highest confidencelevel. As a result, not enough multi-class information is preserved to deal with the conflict nature. The second weakness is the prediction tactic. To predict accurately, a classifier needs to consider a customer's needs, preferences, and demographics. However, traditional prediction relies only on one optimal rule, which is not enough to attain accurate classification. Thirdly, traditional methods classify each case without explanations, that is, they simply list the applicable rule. Such classification is somewhat arbitrary, ambiguous, and irrational. Since customers of the same need may give different ratings to the same product, a classification method must be capable of predicting the probabilities of attaining different customer ratings. The proposed classification algorithm aims to overcome the above deficiency. Details are described next. 3.2. Mine need-rating rules The need-rating data is first organized into a data table I=(O, A∪C), whereOis the set of objects (customers), |O| is the number of customers in I. A is the set of attributes, A={A1,…, Ah,…, A|A|}, |A| is the number of attributes in A, each Ah, h=1,2,…,|A|, is a factor/criterion or a customer characteristic. C corresponds to a set of class labels, C={c1,…, cg,…, c|C|}, |C| is the number of classes in I, each cg, g=1,2,…,|C|, denotes a rating grade. 472 Y. Jiang et al. / Decision Support Systems 48 (2010) 470–479
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有