正在加载图片...
Y Jiang et al. / Decision Support Systems 48(2010)470-479 Age=Middle'A CPU=AverageA Battery Good, while R is ex- Table 3 tracted from customers whose preconditions satisfy 'Age= Middle'A Examples of the need-rating data CPU=Average'A Battery=Good. We found R to contain more spe- Customer Age Expertise CPU Battery Audio Video Ratin cific information and is more precise to use, thus Ry is not necessary for c predicting the ratings of customer c To combine evidence bodies, we apply the evidential reasoning c3 AAA method proposed by Yang and colleagues [40). The evidential reason- M G ing method first transforms the evidence bodies into basic probability Csns 0 A G masses by combining the normalized evidence weights and the belief csoe degrees. Then, the basic probability masses are combined into an ag- cson GAA gregated basic probability assignment. Finally, the aggregated prol ability assignments are normalized For customer c, we assume the matching classification expert set consists of S classification experts, and preference statements of the customer In the process of knowl R=(Rcl,.,Rcu.Res). RcuECES, u=1, 2,.S. The corresponding edge discovery from online reviews, the existence of fake reviews is an evidence bodies and weights are ESc=(EcI., Ecu..., Ecs), and Wo intractable problem[ 15, 29]. Often, fake reviews are entirely negative experts. The matchi es the then collectively decides whether to remove such reviews from the (a) customer c matches one classification expert, ata set. The attributes of age and expertise (computer knowledge)on (b customer c matches more than one classification expert, laptops form customers' demographic profile Customers'needs and (c) customer c does not match any classification experts. preferences are characterized by CPU speed, battery, audio card, and Vhen no classification experts match the new customer, we will video card. The attributes and ratings together with possible values not classify the new customer into any class, that is, the proposed are presented in Table 2 model is unable to predict the ratings of customer c The recommen- Customers'needs and preferences are extracted by the inverse anal- dation system will communicate with the customer and ask for more ysis. For example, if a customer claims that the battery life is insuffi- detailed inputs. In case (a), S=l, there is only one expert Rc1 can be cient or he/she prefers a laptop with long battery life, we infer that the used,we employ Rel to predict the ratings of customer c In case(b). customer needs a laptop with good battery life. If the customer men- R consists of more than one expert, we use the following procedure to tions nothing or claims that the battery life is acceptable, we assume attain classification information that the laptops with an average battery life will meet his/her needs. i, First, the weights of evidence bodies are normalized to form w Ratings represent customers' evaluation about the product after their (w'cl,., w'cu,., wcs), by the following unitary function usage From Dell. com and Bestbuy. com we collected 507 need d-rating ecords, of which 405 cases are randomly selected to create the train ,u=1.2.-.S ing data, and the remaining 102 records form the testing data. Only the training data is submitted to construct the rating classifier for the lap- top. Examples of the need-rating data are shown in Table 3. Second, calculate the basic probability masses through multiplying 4.2. Mine need-rating rules We by the evidence bodies in ESc. Third, calculate the aggregate pro- bability assignment by combining the basic probability masses using To mine classification rules from data, the thresholds of support the formula proposed by Yang and colleagues[40. d confidence degrees are usually set to some small numbers sine The integrated strategy has two main advantages. First, it ensures this can remove the ineffective rules and improve the quality of clas- thatwe take all of the essential multi-class information about the new sifiers. In the rating classification problem, the values of support and customerinto account The comprehensive utilization of multi-class in- confidence thresholds depend on the uncertainty of the need-rating formation plays an important role in constructing an accurate recommen- data. This paper sets the two thresholds to 0.05 and 0.1 respectively. dation system. Second, the integrated probability assignment provides the which are normally used in most of the associative classificatio probabilities of possible ratings given by customers after consumption. methods [37 ]. From the need-rating data, we use the proposed method The ratings together with their respective probabilities allow the recom to discover 377 rules. Among all extracted rules, we found 87.3% of mendation to be more flexible for online stores them conflict with one another. Twenty-two of the 377 rules are presented in Table 4, which will be used to illustrate the rating classifier 4. Experimental study construction procedure. Most of the traditional associative classification methods are de- signed to find only the rules with the highest confidence level, and use nem to classify new objects. However, in our case the conflicting rules The raw data in our experiment come from online stores which sell are retained through the classification expert pruning method. For the laptop, and then retrieve the demographic profile, as well as need share the same precondition but different rating results Expertise=G A CPU=A'A 'Battery=G-Rating=G Table 2 Attribute description. Expertise=GAPU=AA“ Battery=G→ Rating=A Attribute profile Traditional methods will only select ri4 as the classification rule Video due to its high confidence level. In this research we take three phases Average, Poor(P) to derive the multi-class information from these rules. Good Average. The first step is to combine the conflicting rules to form classi ≥35 fication experts and the corresponding evidence bodies. For example,‘Age=Middle’ ∧ ‘CPU=Average’ ∧ ‘Battery≠Good’, while Ri is ex￾tracted from customers whose preconditions satisfy ‘Age=Middle’ ∧ ‘CPU=Average’ ∧ ‘Battery=Good’. We found Ri to contain more spe￾cific information and is more precise to use, thus Rj is not necessary for predicting the ratings of customer c. To combine evidence bodies, we apply the evidential reasoning method proposed by Yang and colleagues [40]. The evidential reason￾ing method first transforms the evidence bodies into basic probability masses by combining the normalized evidence weights and the belief degrees. Then, the basic probability masses are combined into an ag￾gregated basic probability assignment. Finally, the aggregated prob￾ability assignments are normalized. For customer c, we assume the matching classification expert set consists of S classification experts, Rc=(Rc,1, …, Rc,u,…, Rc,S), Rc,u∈CES, u= 1, 2, …,S. The corresponding evidence bodies and weights are ESc=(Ec,1,…, Ec,u,…, Ec,S), and Wc= (wc,1, …, wc,u,…, wc,S). Rating classification of new customers can be supported by matching its characteristics to one of the classification experts. The matching may lead to one of three situations: (a) customer c matches one classification expert, (b) customer c matches more than one classification expert, (c) customer c does not match any classification experts. When no classification experts match the new customer, we will not classify the new customer into any class, that is, the proposed model is unable to predict the ratings of customer c. The recommen￾dation system will communicate with the customer and ask for more detailed inputs. In case (a), S= 1, there is only one expert Rc,1 can be used, we employ Rc,1 to predict the ratings of customer c. In case (b), Rc consists of more than one expert, we use the following procedure to attain classification information. First, the weights of evidence bodies are normalized to form W'c, W'c=(w'c,1, …, w'c,u, …, w'c,S), by the following unitary function: w0 c;u = wc;u ∑ S l= 1 wl ; u = 1; 2; ⋯; S Second, calculate the basic probability masses through multiplying W'c by the evidence bodies in ESc. Third, calculate the aggregate pro￾bability assignment by combining the basic probability masses using the formula proposed by Yang and colleagues [40]. The integrated strategy has two main advantages. First, it ensures that we take all of the essential multi-class information about the new customer into account. The comprehensive utilization of multi-class in￾formation plays an important role in constructing an accurate recommen￾dation system. Second, the integrated probability assignment provides the probabilities of possible ratings given by customers after consumption. The ratings together with their respective probabilities allow the recom￾mendation to be more flexible for online stores. 4. Experimental study 4.1. Data The raw data in our experiment come from online stores which sell Inspiron 1525 laptops. For each customer, we first collect his rating of the laptop, and then retrieve the demographic profile, as well as need and preference statements of the customer. In the process of knowl￾edge discovery from online reviews, the existence of fake reviews is an intractable problem [15,29]. Often, fake reviews are entirely negative or positive. Therefore, to avoid the impact of fake reviews on the accu￾racy of rating classification, each of the co-authors individually iden￾tifies the reviews with complete positive or negative comments, and then collectively decides whether to remove such reviews from the data set. The attributes of age and expertise (computer knowledge) on laptops form customers' demographic profile. Customers' needs and preferences are characterized by CPU speed, battery, audio card, and video card. The attributes and ratings together with possible values are presented in Table 2. Customers' needs and preferences are extracted by the inverse anal￾ysis. For example, if a customer claims that the battery life is insuffi- cient or he/she prefers a laptop with long battery life, we infer that the customer needs a laptop with good battery life. If the customer men￾tions nothing or claims that the battery life is acceptable, we assume that the laptops with an average battery life will meet his/her needs. Ratings represent customers' evaluation about the product after their usage. From Dell.com and Bestbuy.com we collected 507 need-rating records, of which 405 cases are randomly selected to create the train￾ing data, and the remaining 102 records form the testing data. Only the training data is submitted to construct the rating classifier for the lap￾top. Examples of the need-rating data are shown in Table 3. 4.2. Mine need-rating rules To mine classification rules from data, the thresholds of support and confidence degrees are usually set to some small numbers since this can remove the ineffective rules and improve the quality of clas￾sifiers. In the rating classification problem, the values of support and confidence thresholds depend on the uncertainty of the need-rating data. This paper sets the two thresholds to 0.05 and 0.1 respectively, which are normally used in most of the associative classification methods [37]. From the need-rating data, we use the proposed method to discover 377 rules. Among all extracted rules, we found 87.3% of them conflict with one another. Twenty-two of the 377 rules are presented inTable 4, which will be used to illustrate the rating classifier construction procedure. Most of the traditional associative classification methods are de￾signed to find only the rules with the highest confidence level, and use them to classify new objects. However, in our case the conflicting rules are retained through the classification expert pruning method. For example, in Table 4, r12, r13, and r14 are three conflicting rules who share the same precondition but different rating results: ‘Expertise=G’ ∧ ‘CPU=A’ ∧ ‘Battery=G’→‘Rating=G’; ‘Expertise=G’ ∧ ‘CPU=A’ ∧ ‘Battery=G’→‘Rating= P’; ‘Expertise=G’ ∧ ‘CPU=A’ ∧ ‘Battery=G’→‘Rating=A’. Traditional methods will only select r14 as the classification rule due to its high confidence level. In this research we take three phases to derive the multi-class information from these rules. The first step is to combine the conflicting rules to form classi- fication experts and the corresponding evidence bodies. For example, Table 2 Attribute description. Attribute Profile Customer need Rating Age Expertise CPU Battery Audio Video Attribute value ≤24 (Y), 24–34 (M), ≥35 (O) Average (A), Good (G) Average, Good Average, Good Average, Good Average, Good Poor (P), Average, Good Table 3 Examples of the need-rating data. Customer Age Expertise CPU Battery Audio Video Rating c1 YG AG A A G c2 OA A A A G P c3 MG A A G A A c4 MG A A G A G … …… …… … … … c505 OA G A A G G c506 YA AA A G P c507 MG A G G A G Y. Jiang et al. / Decision Support Systems 48 (2010) 470–479 475
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有