正在加载图片...
Y Jiang et al. Decision Support Systems 48(2010)470-479 3.2.2. Classification expert pruning through an improved algorithm Many methods such as support vector machine(SVM)[33] and The goal of pruning classification experts is to generate a minimal information theory 9 have been used to measure attribute weights set of classification experts which can cover all customers in data table Neural network is one of the most efficient methods for deriving the L. Before presenting the pruning algorithm, we define the order of a factor weights [19, 22]. For a rating classification problem, the number relationship on classification experts. of possible attribute sets in rule preconditions totals 2A-1, which is For two classification experts R and rr is said to be inferior to R, equivalent to the number of times traditional data mining techniques that is, Ri>R, if need to train the datasets in order to obtain the weight for each evi- dence body. This is very time-consuming and computationally pro- 1 PiCP hibitive for real-time recommendation systems. In this paper, we For V(Cin. conf,n)EE. =(Cim, confim)EE, CLm is the same employ a NN-based method, which requires training only once with need-rating data to derive the evidence weights. For need-rating data L, we first find a trained neural network N The relationship R>R implies that classification experts R and R with the entire set of attributes A, A=(A1. AhAva), as its input provide the same classification outcome, but R gives the classification Then the output ratings for lol customers in data table I, denoted as information based on a simpler precondition. The classification expert NC=(ncI.,nck. nq o), nck EC, k=1,2,.Ol can be calculated. Sup- Ry is more restrictive but not more powerful relative to R. Hence, R, is pose the real ratings of the customers are RC=[rcr.,rck,, qo) redundant and should be removed to improve the quality of the clas- where rck eC is the rating of the kth customer in O. The network's sifier Fig 3 shows the pruning algorithm used to improve the quality accuracy, denoted do can be calculated as follow of classification fication expert has more complex preconditions but does not provide k=1 added information, then delete it. A simpler precondition is favorable since it provides a more powerful classification capability. The next The weights of evidence bodies can then be computed according to pruning step uses the data covering method [26]. Classification expert the classification experts in CES. Suppose the precondition of classi- Rs is necessary for customer c if Rs is a matching classification expert fication expert R consists of B, B=(A A. 2..AijB the accuracy d forc and there is not another matching classification expert whose pre- without the attributes in Bi are computed by simply setting the con- condition includes that of Rs. Furthermore, we remove the classifica- nection weights from the input attributes ( A, A, 2.A B of the trained tion experts which do not meet the support or confidence threshold. network to zero. Finally, the difference between di and do is found to measure the influence of attribute set (A, A, ,. ALIB) to the classi 3.3. Measure the importance of evidence bodies fication. The greater the influence of the attribute set is to the classi cation, the bigger is its weight. After classification experts are generated, the next step is to deter- Key steps of the nn algorithm are outlined below. mine the weights of the evidence bodies given by classification experts In the traditional evidence theory, evidence weights are consistent (1)Let A=(Al., Ah,..., Av be the set of all input attributes and be the re with human experts'knowledge and experience. If a human expert is(2)Train the neural network N to maximize the network accuracy authoritative and familiar with the decision problem, the evidence body given by her/ him will be reliable. In this research, the evidence do with A as input such that it achieves a network as accurately bodies are derived from different attribute sets. Thus we use the weights of attributes to infer the importance of evidence bodies (3)For i= 1, 2,.,ICESI. let N be a network whose weights are as (a) For all the inputs except (A, A,2,.A, JB,), assign the weights of Ni equal to the weights of N put: The classification expert set(R)and data table(D) )Set the weights from input (AL -A,,Ai B to zero Output: Final classification expert set(CES) Compute the output of n N, denoted as NC=(ncLIs, nuke, ncuol, and the accuracy of Ni, denot The pruning algorithm (4)Compute the influence of B, to the network accuracy w =do-dA. I. For each classification expert R in CES (5)If i>CESL, go to(6), otherwise, set i=i+ 1 and go to (3) if彐R∈CES,R (6) The derived W.,Win., WcEs are the weights of the evidence CES= CES\R, bodies, where wi is the weight of evidence body E. end if The proposed NN method it possible to derive the End for quickly and efficiently without falling into the trap of local mi It therefore attains more accurate evidence weights and impro 2. For each classification expert R, in CES efficiency of the rating classification model For each customer c in data table I 3. 4. Construct rating classifier for potential customers If R, is a necessary for cO,← count+ End if Given the classification experts, evidence bodies, and evidence End fo g system is ready to predict the ratings of End for customers. For a potential customer c, the system identifies the neces- Delete the classification experts in CES which sary classification experts first. For example, R and R are two matching dissatisfy the following constraint: classification experts whose preconditions are 'Age= Middle'A count.> threshold CPU=AverageA 'Battery=Good for R, and Age=Middle'ACPU= Average'for R. R could be extracted from customers whose precon Fig 3. The classification expert pruning algorithm. ditions satisfy Age= Middle’∧tPU= Average'∧ Battery= Good'or3.2.2. Classification expert pruning through an improved algorithm The goal of pruning classification experts is to generate a minimal set of classification experts which can cover all customers in data table I. Before presenting the pruning algorithm, we define the order of a relationship on classification experts. For two classification experts Ri and Rj, Rj is said to be inferior to Ri, that is, Ri≻Rj, if 1. Pi⊂Pj 2. For ∀ (cj,n, confj,n)∈Ej, ∃ (ci,m, confi,m)∈Ei, ci,m is the same as cj,n and confj,n= confi,m. The relationship Ri≻Rj implies that classification experts Ri and Rj provide the same classification outcome, but Ri gives the classification information based on a simpler precondition. The classification expert Rj is more restrictive but not more powerful relative to Ri. Hence, Rj is redundant and should be removed to improve the quality of the clas￾sifier. Fig. 3 shows the pruning algorithm used to improve the quality of classification experts. The first step prunes inferior classification experts. A classification expert Rj should be pruned if ∃Ri∈CES, Ri≻Rj. That is, if the classi- fication expert has more complex preconditions but does not provide added information, then delete it. A simpler precondition is favorable since it provides a more powerful classification capability. The next pruning step uses the data covering method [26]. Classification expert Rs is necessary for customer c if Rs is a matching classification expert for c and there is not another matching classification expert whose pre￾condition includes that of Rs. Furthermore, we remove the classifica￾tion experts which do not meet the support or confidence threshold. 3.3. Measure the importance of evidence bodies After classification experts are generated, the next step is to deter￾mine the weights of the evidence bodies given by classification experts. In the traditional evidence theory, evidence weights are consistent with human experts' knowledge and experience. If a human expert is authoritative and familiar with the decision problem, the evidence body given by her/him will be reliable. In this research, the evidence bodies are derived from different attribute sets. Thus we use the weights of attributes to infer the importance of evidence bodies. Many methods such as support vector machine (SVM) [33] and information theory [9] have been used to measure attribute weights. Neural network is one of the most efficient methods for deriving the factor weights [19,22]. For a rating classification problem, the number of possible attribute sets in rule preconditions totals 2|A| −1, which is equivalent to the number of times traditional data mining techniques need to train the datasets in order to obtain the weight for each evi￾dence body. This is very time-consuming and computationally pro￾hibitive for real-time recommendation systems. In this paper, we employ a NN-based method, which requires training only once with need-rating data to derive the evidence weights. For need-rating data I, we first find a trained neural network N with the entire set of attributes A, A= {A1,…, Ah,…, A|A|}, as its input. Then the output ratings for |O| customers in data table I, denoted as NC= {nc1,…, nck,…, nc|O|}, nck∈C, k= 1,2,…,|O|, can be calculated. Sup￾pose the real ratings of the customers are RC= {rc1,…, rck,…, rc|O|}, where rck∈C is the rating of the kth customer in O. The network's accuracy, denoted d0 can be calculated as follows: d0 = ∑ jOj k= 1 vk; vk = 1 If nck is the same as rck 0 Otherwise : ( The weights of evidence bodies can then be computed according to the classification experts in CES. Suppose the precondition of classi- fication expert Ri consists of Bi, Bi= {Ai,1,Ai,2,…Ai,|Bi |}, the accuracy di without the attributes in Bi are computed by simply setting the con￾nection weights from the input attributes {Ai,1,Ai,2…Ai,|Bi |}of the trained network to zero. Finally, the difference between di and d0 is found to measure the influence of attribute set {Ai,1,Ai,2,…Ai,|Bi |} to the classi- fication. The greater the influence of the attribute set is to the classi- fication, the bigger is its weight. Key steps of the NN algorithm are outlined below. (1) Let A= {A1,…, Ah,…, A|A|} be the set of all input attributes and RC be the real ratings. (2) Train the neural network N to maximize the network accuracy d0 with A as input such that it achieves a network as accurately as possible. (3) For i= 1, 2,…, |CES|, let Ni be a network whose weights are as follows: (a) For all the inputs except {Ai,1,Ai,2,…Ai,|Bi |}, assign the weights of Ni equal to the weights of N. (b) Set the weights from input {Ai,1,Ai,2,…Ai,|Bi |}to zero. Compute the output of network Ni, denoted as NCi= {nci,1,…, nci,k,…, nci,|O|}, and the accuracy of Ni, denoted as di. (4) Compute the influence of Bi to the network accuracy: wi=d0 − di. (5) If i≥|CES|, go to (6), otherwise, set i=i+ 1 and go to (3). (6) The derived {w1,…, wi,…, w|CES|} are the weights of the evidence bodies, where wi is the weight of evidence body Ei. The proposed NN method makes it possible to derive the weights quickly and efficiently without falling into the trap of local minimum. It therefore attains more accurate evidence weights and improves the efficiency of the rating classification model. 3.4. Construct rating classifier for potential customers Given the classification experts, evidence bodies, and evidence weights, the recommendation system is ready to predict the ratings of customers. For a potential customer c, the system identifies the neces￾sary classification experts first. For example, Ri and Rj are two matching classification experts whose preconditions are ‘Age=Middle’ ∧ ‘CPU=Average’ ∧ ‘Battery=Good’ for Ri, and ‘Age=Middle’ ∧ ‘CPU= Average’ for Rj. Rj could be extracted from customers whose precon￾Fig. 3. The classification expert pruning algorithm. ditions satisfy ‘Age=Middle’ ∧ ‘CPU=Average’ ∧ ‘Battery=Good’ or 474 Y. Jiang et al. / Decision Support Systems 48 (2010) 470–479
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有