ARTICLE IN PRESS Computers Operations Research I(aIm)Ill-Ill Contents lists available at Science Direct Computers Operations research ELSEVIER journalhomepagewww.elsevier.com/locate/caor A strategy-oriented operation module for recommender systems In e-commerce Hsiao-Fan Wang, Cheng-Ting Wu Department of Industrial Engineering and Engineering Management, National Tsing Hua University, No. 101, Section 2 Kuang Fu Road, Hsinchu, Taiwan 30013, ROC ARTICLE INFO ABSTRACT Electronic commerce(EC)has become an important support for business and is regarded as an efficient Keywords: ystem that connects suppliers with online users. Among the applications of EC, a recommender system (RS)is undoubtedly a popular issue to make the best recommendation to the users. Even if many approaches have been proposed to perfect the recommendation, a comprehensive module comprising of essential sub-modules of input profiles, a recommendation scheme, and an output interface of Clique-effects collaborative filtering recommendations in the Rs is still lacking. Besides, the fundamental issue of profit consideration for an C company is not stressed in general terms. Therefore, this study aims struct an rs with a strategy-oriented operation module regarding the above aspects; and with this module, an approach named clique-effects collaborative filtering(CECF) for predicting the consumers purchase behavior was proposed. Finally, we applied our proposed module to a 3C retailer in Taiwan, and promising results Scope and Purpose: This study aims to construct a comprehensive module for the recommender recommender system. By utilizing the proposed module with marketing strategies and an effective on-line interface scheme, the recommender system could emphasize not only the customer satisfaction as conventional recommender system suggested, but also the suppliers profit which shall be an important issue to an E-commerce company. Thus, a better recommendation environment could e 2010 Elsevier Ltd. All rights reserved. 1. Introduction system is urgent and essential for an EC company. By providing more helpful information to users, faster and more satisfactory Electronic commerce(EC) has been widely used by online decisions can be made: and thus, opportunities of retaining users to perform different daily activities through the Internet. customers and gaining profits are higher Online shopping is one of the popular applications among these Many EC suppliers use the recommender systems(RSs)to activities. Instead of conventional shopping. EC provides alter- out the preferences of target users so that the right products can native ways for users to get information on products such as price, be suggested [45 ] A well-established RS can add value to an EC availability, suppliers, substitutes, and even manufacturing company in several ways-(1) users can retrieve product process [39, 54]. For competitiveness, Ec companies need to information easily, (2) cross-selling for users can be enhanced, develop higher business interoperability on their electronic and (3)users' loyalty can be sustained by good service. There are market places by improving the electronic market functions numerous studies in the fields of social networks [34] and [52, 53]. The enhancement of electronic market functions could information filtering techniques [42]. In social networks, people lead to an overall reduction of interaction cost for business with similar characteristics tend to associate with each other The interoperation on all types of electronic market places [15]. use of social network structure generally allows the ec to identify among the numerous EC functions which provide so the products of likely interest to the target users based on some vailable information, it is difficult for online users to information provided by the members of the network [ 19, 28.On ick and effective decisions [48]. Facing fierce market the other hand, information filtering techniques that analyze competition and impatient users, a personalized decision support users' preferences and help EC Web sites achieve accurate product selection By filtering the information provided by the users, the techniques aim to track the purchase behavior of users Corresponding author. TeL: +88635742654x42654: fax: +88635722685 and recommend proper products. Among information filtering techniques, collaborative filtering(CF[25. 45 46 is one of the er e 2010 Elsevier Ltd. All rights reserved. doi:10.1016jc Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research(2010). doi: 10. 1016 j cor. 2010.03.011
A strategy-oriented operation module for recommender systems in E-commerce Hsiao-Fan Wang , Cheng-Ting Wu Department of Industrial Engineering and Engineering Management, National Tsing Hua University, No. 101, Section 2 Kuang Fu Road, Hsinchu, Taiwan 30013, ROC article info Keywords: Electronic commerce Recommender system Marketing strategy Clique-effects collaborative filtering abstract Electronic commerce (EC) has become an important support for business and is regarded as an efficient system that connects suppliers with online users. Among the applications of EC, a recommender system (RS) is undoubtedly a popular issue to make the best recommendation to the users. Even if many approaches have been proposed to perfect the recommendation, a comprehensive module comprising of essential sub-modules of input profiles, a recommendation scheme, and an output interface of recommendations in the RS is still lacking. Besides, the fundamental issue of profit consideration for an EC company is not stressed in general terms. Therefore, this study aims to construct an RS with a strategy-oriented operation module regarding the above aspects; and with this module, an approach named clique-effects collaborative filtering (CECF) for predicting the consumer’s purchase behavior was proposed. Finally, we applied our proposed module to a 3C retailer in Taiwan, and promising results were obtained. Scope and Purpose: This study aims to construct a comprehensive module for the recommender systems. The proposed strategy-oriented operation module comprises the essential parts of a recommender system. By utilizing the proposed module with marketing strategies and an effective on-line interface scheme, the recommender system could emphasize not only the customer’s satisfaction as conventional recommender system suggested, but also the supplier’s profit which shall be an important issue to an E-commerce company. Thus, a better recommendation environment could be displayed. & 2010 Elsevier Ltd. All rights reserved. 1. Introduction Electronic commerce (EC) has been widely used by online users to perform different daily activities through the Internet. Online shopping is one of the popular applications among these activities. Instead of conventional shopping, EC provides alternative ways for users to get information on products such as price, availability, suppliers, substitutes, and even manufacturing process [39,54]. For competitiveness, EC companies need to develop higher business interoperability on their electronic market places by improving the electronic market functions [52,53]. The enhancement of electronic market functions could lead to an overall reduction of interaction cost for business interoperation on all types of electronic market places [15]. However, among the numerous EC functions which provide so much available information, it is difficult for online users to make quick and effective decisions [48]. Facing fierce market competition and impatient users, a personalized decision support system is urgent and essential for an EC company. By providing more helpful information to users, faster and more satisfactory decisions can be made; and thus, opportunities of retaining customers and gaining profits are higher. Many EC suppliers use the recommender systems (RSs) to find out the preferences of target users so that the right products can be suggested [45]. A well-established RS can add value to an EC company in several ways—(1) users can retrieve product information easily, (2) cross-selling for users can be enhanced, and (3) users’ loyalty can be sustained by good service. There are numerous studies in the fields of social networks [34] and information filtering techniques [42]. In social networks, people with similar characteristics tend to associate with each other. The use of social network structure generally allows the EC to identify the products of likely interest to the target users based on some information provided by the members of the network [19,28]. On the other hand, information filtering techniques that analyze users’ preferences and help EC Web sites achieve accurate product selection. By filtering the information provided by the users, the techniques aim to track the purchase behavior of users and recommend proper products. Among information filtering techniques, collaborative filtering (CF) [25,45,46] is one of the ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/caor Computers & Operations Research 0305-0548/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2010.03.011 Corresponding author. Tel.: +886 3 5742654x42654; fax: +886 3 5722685. E-mail address: hfwang@ie.nthu.edu.tw (H.-F. Wang). Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011 Computers & Operations Research ] (]]]]) ]]]–]]]
ARTICLE IN PRESS most commonly adopted method. The concept n be applied at different levels of he community's opinions as input ng information that can development o in Section 3. T case study in Section 5, with sug 2. Literature re systems Schafer et al. [45.46 investigated the ir suD-module recommendations, and (3) re interface between the tw the current deve 2.1. Input source market baskets; le for recommender systems in E-commerce
ARTICLE IN PRESS most commonly adopted method. The concept of the CF is much related to the social network. The CF technique uses collaborative information from ‘‘neighbors,’’ which are defined as users with similar behavior to the target user. CF is also regarded as the most effective method for the RS. However, CF’s drawback is that no recommendation could be made if a user’s related data are sparse [26]. On the other hand, excessive emphasis on recommendation performance could lead to the neglect of the profit, which is also an essential concern for an EC company. Aside from this, although there are different approaches to retrieve the needed information for recommendation, a systematic and comprehensive decision module is still lacking. Therefore, the time spent on data retrieval can be long, and the recommended products may not match the users’ desires. In particular, without a structural module, documenting the recommending procedure becomes difficult, and achieving the goal of ‘‘the right goods for the right person’’ becomes impossible. With these concerns, we aim to propose a strategy-oriented operation module that could be comprehensively applied to EC Web sites as a decision support mechanism so that the choice of various marketing strategies that consider profit for both suppliers and users can be developed. In addition, under the framework of the proposed recommender module, we also propose a clique-effects collaborative filtering (CECF) technique to predict users’ purchase behavior. In particular, this paper presents the modeling perspective to the e-service system i.e. the recommender system. The proposed RS module aims to fulfill the profits of the customers and suppliers; the final stage of product selection is described as a linear bi-objective model, of which all required arguments are derived from the offline database and the CECF. The paper is organized as follows. Section 2 discusses the literature related to the framework, issues, and the further development of an RS. The strategy-oriented operation module applied to an RS will be developed along with the proposed CECF in Section 3. Then we apply our proposed RS to a 3C retailer as a case study in Section 4. Finally, concluding remarks are given in Section 5, with suggestions on further research. 2. Literature review of the infrastructure of recommender systems Schafer et al. [45,46] and Montaner et al. [37] have investigated the infrastructure of an RS in the framework of three sub-modules: (1) input sources of the users’ profiles, (2) output of recommendations, and (3) recommendation methods as the interface between the two. In this section, we shall briefly review the current developments with respect to these three submodules. 2.1. Input sources Usually, input sources include users’ individual profiles which could be used to gather preferences for specific items, item attributes, ratings, and keywords or even purchase history [46]. Schafer et al. have classified input sources into two types [46]: (1) single users’ profiles—the preferences of the target user for whom we are recommending, and (2) communities’ opinions as an input regarding the general community of other users, that is, the target user is represented by the community. The two types of inputs allow the RS to make suggestions for different reasons. For a target user, the individual profiles are inputted to the recommender agent to provide personalized information, whereas the input profiles of the community are fed into the RS to reflect opinions from multiple individuals as a whole. Therefore, these two types can be applied at different levels of personalization. In particular, the community’s opinions as input are helpful in reinforcing or complementing information that can be retrieved from single user’s profiles. This could be specified by the well-known issue of the ‘‘new user’’ problem, which is one of the cases in the ‘‘ramp-up’’ problem [27]. Recommendation for new users faces the challenge that the neighbors are hard to identify in a start-up company since the new users’ profiles are lacking. When this phenomenon is translated into a user–item relation matrix, the matrix will be sparse. In particular, if a highly dimensional database is developed for an RS, the problem of identifying neighborhood becomes severe from the sparse user– item relation matrix. In order to solve the problems of sparse data or missing values, many approaches based on CF have been proposed. The issues of sparse matrix or missing values are often tackled with dimensionality reduction techniques [7,14,24,43]. Several dimensionality reduction techniques have been developed and applied to Jester, Movielens and EachMovie datasets. And in Eigentaste, Goldberg et al. [14] divided the recommendation process into two stages: online and offline operations. In the offline stage, the authors exploited the principal component analysis (PCA) to facilitate dimensionality reduction so that user’s profiles which are formed through rating the gauge set are projected into an eigen plane. Consequently, in the online stage, the target user is asked to rate the gauge set to receive recommendations. An alternative approach to estimate the missing values and to reduce the dimensionality of user–item relation matrix is the method of singular value decomposition (SVD), which has been exploited by Sarwar et al. [43]. SVD appears to be a common method for matrix factorization that results in the best lower rank approximations of the user–item relation matrix; however, Sarwar et al. suggested that the SVD-based method would yield better results in dense datasets of which a start-up company does not possess. Kim and Yum [24] further suggested an evolved PCA-iterative method, in which SVD is performed iteratively to improve the accuracy of imputed values based on prior results. Nevertheless, to accommodate the dimensionality reduction to the recommendation process, the new user usually requires to rate on the specifically designated item set, for example, the gauge set, which could contain items that the new user never knows; besides, the size of designated item set should also be carefully controlled in case of driving the impatient customers out of the system. As indicated by Herlocker et al. [18] and Linden et al. [31], using PCA- or SVD-based techniques for dimensionality reduction would cause a lower recommendation quality since recommendations for items are more restricted to specific subjects; examining a small user sample such as the gauge set, the chosen neighborhoods are less similar with the target user. Moreover, Bell et al. [5] argued that the methods using imputed ratings, which significantly outnumber the original ratings, rely on imputation risk; and such risk would distort the data due to inaccurate imputation. To realize a user’s purchase behavior, the information revealed by a user’s profiles is often investigated. Generally, there are two kinds of user’s profiles that are commonly searched and collected. These are the user’s ratings [45] and market basket data [35]. User’s ratings refer to the scores given to item attributes by a user, and the user’s ratings are often analyzed to define preference. On the other hand, market basket data contain a user’s purchase history and probably demographic features. Specifically, each item presented in a user’s basket data could either be ‘‘0’’ or ‘‘1’’ to denote whether an item is purchased , ‘‘1’’, or not, ‘‘0’’. There are always a number of transactional data in the market baskets; hence, management of these input profiles should be easier to maintain and retrieve. 2 H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H.-F. Wang. C.-T. Wu Computers Ope The usual techniques used to maintain user's profiles are prevent poor prediction due to rarely relevant information [44]. history-based model [37] and the vector space m because the conventional CF approach utilizes A history-based model lists purchase rec bors to make a prediction for a target user, of non-neighbors out of considera- the impacts of ng the effects could be also space mo for large am adopted in t 2.2. Output of In g information simplest item. A s consider it or unord advertising st tion, displ cross-selling recommenda are not rel promotions. I products that 2.3. Recon Recommer and efficiency items accc know us use a rela mapping purchased lles fore, of ed by es involved: the le 1, we list of EC constructed ded as built Shih developed Please cite this article as -commerce. Computers and Opera
ARTICLE IN PRESS The usual techniques used to maintain user’s profiles are the history-based model [37] and the vector space model [11,40]. A history-based model lists purchase records, navigation history, or the contents of e-mail boxes to define users’ profiles. In the vector space model, items are represented with a vector of features or attributes, usually words or concepts (such as a binary column to denote the purchased state or a column to denote the attributive value of an item), with an associated value. The vector space model is more efficient for computation, so it is often used for large amounts of data. For this reason, it is also the model adopted in this paper to maintain the database. 2.2. Output of recommendations In general, the output is a suggestion of product(s) containing information on item type, quantity, and appearance [46]. The simplest form of a suggestion is the recommendation of a single item. A single item increases the chance that a user will seriously consider it desirable. More commonly, an RS provide an ordered or unordered recommendation list for a user [38]. Some advertising strategies can also be embedded in the recommendation, displaying bundled items, which could help enhance cross-selling and up-selling. By comparing bundled items with a recommendation list, bundled items may include products that are not related to the users since they are generated for promotions. In contrast, a recommendation list shows a set of products that satisfies users’ preferences to a certain degree. 2.3. Recommendation methods Recommendation methods are concerned with the accuracy and efficiency of prediction and presentation of the recommended items according to users’ input sources. For an RS, it is critical to know users’ preferences systematically. An essential concept is to use a relational database which is constructed offline. Then by mapping a new user to the database, a product that has been purchased by the same type of historical users can easily be picked up for the target user [29]. Clustering analysis is the technique that groups users/items with similar characteristics/properties into one group. By clustering, the search dimensionality can be reduced which speeds up the mapping process. A wide range of applications have been implemented by clustering techniques, and one of these is used to predict unknown users based on the group they belong to [49]. By analyzing the properties of the groups, we can learn about the characteristics of new users by identifying the group they belong to and thus provide them with the items that the same group has mostly bought. Besides, clustering analysis is also a very useful tool for looking for the ‘‘neighbors’’ in the information filtering technology. That is, the users called the neighbors are chosen by certain methods, such as clustering techniques, to support the prediction [6]. Information filtering technology has the ability to define user preferences with little effort. It is divided into two main categories [26]—collaborative filtering (CF) and content-based filtering (CBF). CF is the most popular approach to predict the probability that a user will purchase a specific item based on other users’ preferences [21]. A CF method functions by matching people with similar interests and then making recommendations. However, in the initial state of an RS, the main problem would be insufficient users’ profiles sustain the prediction basis while using CF. Consequently, the drawback of CF is the requirement of some relevant rating data given by the target user. Usually, by clustering users into groups before predicting, group influences could be utilized by recommendation methods on the target user to prevent poor prediction due to rarely relevant information [44]. Furthermore, because the conventional CF approach utilizes preferences of neighbors to make a prediction for a target user, it leaves additional influences of non-neighbors out of consideration. As a result, research tends to discriminate the impacts of neighbors from non-neighbors [23]; by integrating the effects caused by the two sources, better performance could be also expected. CBF is the technology of analysis based on terms in the content such as texts or documents on the Web site. It considers term frequency in the content and its relation to the user’s preference. However, with other media such as music or movies, its performance is not as good as text content because these objects are not easily indexed. In addition, the maintenance of numerous heterogeneous electronic product catalogues on the Internet is still a tough task [16]. Nevertheless, CF is still most commonly used since it is flexible and easily adaptable to an EC’s RS [7]. Therefore, in this paper, we would incorporate the concept of CF into our system as the basic recommendation mechanism. In addition to CF and CBF, another technique requires the private information of a user. Demographic filtering (DF) explains users by their personal demographics [17]. A DF approach uses descriptions of people to learn the probability that an item is most preferred by what type of persons. Therefore, this method would lead to the same recommendation if the users have similar personal data. However, the DF approach requires more information regarding a user’s privacy; therefore, DF is confronted with the problem that it is not easy to collect users’ demographic descriptions. Consequently, the DF method requires collaborating with other methods such as CF or CBF [37]. Besides the aforementioned filtering techniques, rules derived from the market basket analysis between items in large databases also account for an RS. Market basket analysis has been a popular system in finding the correlation among baskets [2,41]. One of the techniques is the famous association rules method, which was first introduced by Agrawal et al. [3]. Association rules have been used to find the pattern of the probability of buying a specific product when another product is purchased. In such a recommending environment, many rules have been developed on how the different purchase behaviors of users can be treated [20]. Therefore, Sarwar et al. also proposed a method of associationrule based recommendation (ABR) in 2000 [42]. However, for the huge amount of transaction data, there may be many biased rules that would affect the precision of the recommendation. Therefore, the market basket analysis shall be conducted with the aid of filtering techniques such as CF, and the common concept of the CF method adapted to the binary market basket data as proposed by Mild and Reutterer [36]. 2.4. Roles with their goals in a recommender system In the current RS, there are three common roles involved: the supplier, the system developer, and the user. In Table 1, we list possible considerations for constructing an RS. In the fields of EC trading, Li and Wang proposed a multi-agent-based model with a win-win negotiation approach of which the agents seek to strike a fair deal that also maximizes the payoff for everyone involved [30]. However, such kind of win-win negotiation mechanism has not been discussed in the RSs with more comprehensive scale. For the existing research, the ‘‘performance of recommendation’’ is an attribute that benefits users. Therefore, when ‘‘more is better’’ is stressed, only the number of sold products is maximized but not necessarily the profit. In other words, an RS is usually constructed from a user’s standpoint. Only a few RSs could be regarded as built from a supplier’s perspective. For instance, Liu and Shih developed H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] 3 Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H -F Wang, C-T. Wu/Computers 8 Operations Research i(am)Il-l Table Roles and resolution in recommender systems. User Supplier System developer Objective Constraints qu&Cs) Problem types zation problem Maximization problem Multi-objective problem Maximization problem Note: (u): user:(s): supplier. o fulfill the demands of oneself, o(s)objective of the supplier: maximize profit or products sold. C(u constraint of the user: budgets in hand and as) constraint of the supplier: fulfill demands of users. a weighted RFM-based method for an RS 32,33. where RFM With the above concerns, in this study, we propose a strategy- means recency, frequency, and monetary: it considers the user's oriented operation module for the rs comprising(1)an offline lifetime value which is helpful in extending market share in the database, (2)CECF, and (3)the analytical model. An offline long run. However, for an RS constructed from the viewpoint of database that could be mathematically supported for the rs is system developer, issues should be considered that not only to developed. The database consists of three parts-user-group data fulfill the user's needs(preferences, budgets)but also to raise the item-group data, and the relations in between. The offline supplier's profit. Changchien et al. discussed sales promotion database is designed with the two characteristics: (1)the users based on businesses'marketing strategies, pricing strategies, and and the items are classified into groups according to their respective features/attributes(see Sections 3. 1.1 and 3. 1.2).As win situation[9]. However, the study prioritized the probability of suggested in the literatures, PCA- or SVD-based approaches may an inequitable supplier so that it may be difficult to keep a users lose prediction accuracy due to excessively restricted dataset loyalty. Therefore, it is also necessary to construct an RS that from which the neighborhood is formed. Thus we adopt the allows both parties to justify their priorities. classification technique for dimensionality reduction. We regard any individual in a group as an information provider, which is 2.5. Summary and discussion especially important to a start-up RS with rare data, (2)the group effects are much easier to be retrieved By bringing out additional From the brief review of the recent RSs, some aspects could be effects from the groups of users and items, we aim to dilute the there is no complete manipulated module that supports all inconsistent imputed data like average scores sub-modules of input module, output module, and recommenda- tion interface an RS. The researchers also realized that through over prediction, the priority of group effects shall be well quantitative measurement, the performance of the system can be error e, under the proposed offline database, we general applications in an RS From the viewpoint of managing an group's effects, CECF is likely helpful in solving the situation of EC site and its RS, it is more robust and convenient if an analytical sparse data and the so-called"ramp up"problem. In addition,we model comprising the three sub-modules can be imported also introduce an analytical model proposed by wang and wu facilitate the product selection process. With this regard, [51]. The analytical model could allow the system developer to developing a comprehensive module that can achieve the actively adjust the priority between the supplier's profit and the transparent requirements of the decision-support process and user's satisfaction level. Therefore, in the next section, we shall provide a good solution for recommendation purposes is propose the strategy-oriented operation module whose cores necessary and would be presented in this study consist of ceCf and the cal model: the module aims to Second, we found merits and deficiencies in each of the describe the recommendat ocess and provide better recom- existing recommendation approaches. Since RSs have different mendation performance types of input sources such as users ratings or market basket lata, the corresponding recommendation method will be a key ub-module that determines the success of an RS. as the 3. the proposed recommendation module applications in CE, personal profiles of target users are first used to match their neighbors: the purchase behaviors of the Based on the issues specified in Section 2. 4 that an RS shall neighbors are then exploited to predict target users' choices. provide three roles to be switched and the summary in Section However, for an EC Web site that is a start-up or is selling 2.5, we propose an RS( Fig. 1)with the recommendation module products with high prices, it would be confronted with the composed of three sub-modules-input, the recommendation problem that not enough basket data support the market basket method, and output. The input sub-module deals with the input recommendation performance would be very poor. Since the new system would be the demographic information, the binary basket ser with few personalized information is difficult to categorize data, and the target user's requests of the desired satisfaction the communitys opinions could be adopted to complement the level and budget limit. The output sub-module would provide the insufficient information. For a user whose personal profiles are recommended items from the result of the recommendation identity 2n. the community s opinions reinforce tne users online o perations. Thnenrecompmedation method which is the core Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-com Computers and Operations Research(2010). doi: 10. 1016/j. cor. 2010.03.011
ARTICLE IN PRESS a weighted RFM-based method for an RS [32,33], where RFM means recency, frequency, and monetary; it considers the user’s lifetime value which is helpful in extending market share in the long run. However, for an RS constructed from the viewpoint of system developer, issues should be considered that not only to fulfill the user’s needs (preferences, budgets) but also to raise the supplier’s profit. Changchien et al. discussed sales promotion based on businesses’ marketing strategies, pricing strategies, and users’ purchasing behavior, which could potentially be a winwin situation [9]. However, the study prioritized the probability of an inequitable supplier so that it may be difficult to keep a user’s loyalty. Therefore, it is also necessary to construct an RS that allows both parties to justify their priorities. 2.5. Summary and discussion From the brief review of the recent RSs, some aspects could be emphasized to improve an RS. First, it should be noted that so far, there is no complete manipulated module that supports all sub-modules of input module, output module, and recommendation interface in an RS. The researchers also realized that through quantitative measurement, the performance of the system can be better controlled and evaluated. This triggers our main goal in this study to develop an operation module for systematic analysis and general applications in an RS. From the viewpoint of managing an EC site and its RS, it is more robust and convenient if an analytical model comprising the three sub-modules can be imported to facilitate the product selection process. With this regard, developing a comprehensive module that can achieve the transparent requirements of the decision-support process and provide a good solution for recommendation purposes is necessary and would be presented in this study. Second, we found merits and deficiencies in each of the existing recommendation approaches. Since RSs have different types of input sources such as user’s ratings or market basket data, the corresponding recommendation method will be a key sub-module that determines the success of an RS. As the applications in CF, personal profiles of target users are first used to match their neighbors’; the purchase behaviors of the neighbors are then exploited to predict target users’ choices. However, for an EC Web site that is a start-up or is selling products with high prices, it would be confronted with the problem that not enough basket data support the market basket analysis (dataset is sparse or with missing values); therefore, that recommendation performance would be very poor. Since the new user with few personalized information is difficult to categorize, the community’s opinions could be adopted to complement the insufficient information. For a user whose personal profiles are already known, the community’s opinions reinforce the user’s identity [23]. With the above concerns, in this study, we propose a strategyoriented operation module for the RS comprising (1) an offline database, (2) CECF, and (3) the analytical model. An offline database that could be mathematically supported for the RS is developed. The database consists of three parts—user-group data, item-group data, and the relations in between. The offline database is designed with the two characteristics: (1) the users and the items are classified into groups according to their respective features/attributes (see Sections 3.1.1 and 3.1.2). As suggested in the literatures, PCA- or SVD-based approaches may lose prediction accuracy due to excessively restricted dataset from which the neighborhood is formed. Thus we adopt the classification technique for dimensionality reduction. We regard any individual in a group as an information provider, which is especially important to a start-up RS with rare data, (2) the group effects are much easier to be retrieved. By bringing out additional effects from the groups of users and items, we aim to dilute the imprecise prediction caused by rare data, and to prevent inconsistent imputed data like average scores. However, to avoid the imputed group effects predominating over prediction, the priority of group effects shall be wellarranged. Therefore, under the proposed offline database, we base on CF to propose a clique-effects approach, namely, CECF. With the scheme of adjustable weights between individual’s and group’s effects, CECF is likely helpful in solving the situation of sparse data and the so-called ‘‘ramp up’’ problem. In addition, we also introduce an analytical model proposed by Wang and Wu [51]. The analytical model could allow the system developer to actively adjust the priority between the supplier’s profit and the user’s satisfaction level. Therefore, in the next section, we shall propose the strategy-oriented operation module whose cores consist of CECF and the analytical model; the module aims to describe the recommendation process and provide better recommendation performance for the RS. 3. The proposed recommendation module Based on the issues specified in Section 2.4 that an RS shall provide three roles to be switched and the summary in Section 2.5, we propose an RS (Fig. 1) with the recommendation module composed of three sub-modules—input, the recommendation method, and output. The input sub-module deals with the input profiles of a target user; the types of profiles considered in the system would be the demographic information, the binary basket data, and the target user’s requests of the desired satisfaction level and budget limit. The output sub-module would provide the recommended items from the result of the recommendation method. Both input and output sub-modules are categorized into online operations. The recommendation method, which is the core Table 1 Roles and resolution in recommender systems. User Supplier System developer Objective O(u) O(s) Win–win strategy Maximal profit strategy O(u) & O(s) O(s) Constraints C(u) C(s) C(u) & C(s) C(u) & C(s) Problem types Maximization problem Maximization problem Multi-objective problem Maximization problem Note: (u): user; (s): supplier.O(u) Objective of the user: fulfill the demands of oneself, O(s) objective of the supplier: maximize profit or products sold, C(u) constraint of the user: budgets in hand and C(s) constraint of the supplier: fulfill demands of users. 4 H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H.-F. Wang C-T. Wu / Computers 8 Operations Research I (m)I Target user browses in Identify target user'sprofil Retrieve relational data user'srequests Metadata of the user a retrieved off-line database Analytica decal model database Online Operations Fig. 1. The proposed recommendation module. of the recommending module, functions with an online analytical Table 2 model under the offline database constructed from three Classification rules when K=3 group end item-group end, and the relations in between. Exploiting the proposed CECF approach, the offline Attribute labels database provides required information retrieval of the target 1 user's purchase probability measure on each item. The analytical model is then run by metadata composed of the target user's request and what has been retrieved from the offline database. In articular, the analytical model uses a bi-objective function that [a1. aaMa would allow choice between the win-win strategy and th 2,xx} maximal profit strategy, which were proposed by Wang and wu 51. The win-win strategy not only matches the users taste but also enhances the suppliers profit, whereas the maximal profit trategy recommends products based on maximization of profit. This section is organized as follows. First, we would specify the construction of the offline database including the user-group a Zk,...,], to be an attribute vector of pa, then the set of items Item-group data. Then the proposed clique-effects approach in the database is P=lPa(ax)ld=1, 2,. D). All items in the ased on CF(CECF) would be presented in Section 3. 2. Finally database are further classified into mutually exclusive ve would clarify online and offline operations as well as present item-groups as P=lPa(ax)d=1, 2,. D, i=1, 2,. I). each the analytical model in Section 3.3 with IPiI=D, and thus U Pi=P and E,Di=D In particular. 3.1. Offline operations we classify the items with respect to the item attributes. A threshold of each attribute value is given; each item with specific attribute values above those thresholds will be assigned to In this section, we would specify the construction of the offline the corresponding group. The number of attributes (K) would be database including the user-groups data and item-groups data referred with its power set and then 2 item-groups are generated. For instance, in Table 2, the number of item-groups 3.1.1. Item-groups with their properties generated is 8 when K is 3: an item would be distributed into Let d be the items in the market basket, with each item Class 5 only if its attribute values in al. 2 are higher than the denoted as Pd, where d=1,., D. Define Yp=[1,2 thresholds of a1, a 2 as well as its a3 value lower than the Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research(2010). doi: 10. 1016 j cor. 2010.03.011
ARTICLE IN PRESS of the recommending module, functions with an online analytical model under the offline database constructed from three parts—user-group end, item-group end, and the relations in between. Exploiting the proposed CECF approach, the offline database provides required information retrieval of the target user’s purchase probability measure on each item. The analytical model is then run by metadata composed of the target user’s request and what has been retrieved from the offline database. In particular, the analytical model uses a bi-objective function that would allow choice between the win–win strategy and the maximal profit strategy, which were proposed by Wang and Wu [51]. The win–win strategy not only matches the user’s taste but also enhances the supplier’s profit, whereas the maximal profit strategy recommends products based on maximization of profit. This section is organized as follows. First, we would specify the construction of the offline database including the user-group and item-group data. Then the proposed clique-effects approach based on CF (CECF) would be presented in Section 3.2. Finally, we would clarify online and offline operations as well as present the analytical model in Section 3.3. 3.1. Offline operations In this section, we would specify the construction of the offline database including the user-groups data and item-groups data. 3.1.1. Item-groups with their properties Let D be the items in the market basket, with each item denoted as pd, where d¼1,y,D. Define Cpd ¼ ½a1,a2, ... , ak, ... ,aK pd to be an attribute vector of pd, then the set of items in the database is P ¼ fpdðakÞjd ¼ 1,2, ... ,Dg. All items in the database are further classified into mutually exclusive item-groups as Pi ¼ fpdiðakÞjdi ¼ 1i ,2i , ... ,Di ,i ¼ 1,2, ... ,Ig, each with jPi j ¼ Di , and thus SI i ¼ 1 Pi ¼ P and PI i ¼ 1 Di ¼ D. In particular, we classify the items with respect to the item attributes. A threshold of each attribute value is given; each item with specific attribute values above those thresholds will be assigned to the corresponding group. The number of attributes (K) would be referred with its power set and then 2K item-groups are generated. For instance, in Table 2, the number of item-groups generated is 8 when K is 3; an item would be distributed into Class 5 only if its attribute values in a1, a2 are higher than the thresholds of a1, a2 as well as its a3 value lower than the Target user browses in Identify target user’sprofiles satisfied? N Update periodically Online Operations Off-line Operations interface Y Modify target user’sprofiles Analytical model Data retrieved Retrieve relational data New basket database Metadata of the user off-line database The recommendation list user’srequests Fig. 1. The proposed recommendation module. Table 2 Classification rules when K¼3. Class Attribute labels 1 Non 2 fa1g\fa2,a3g 3 fa2g\fa1,a3g 4 fa3g\fa1,a2g 5 fa1,a2g\fa3g 6 fa1,a3g\fa2g 7 fa2,a3g\fa1g 8 H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] 5 Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H.-F. Wang, C-T. Wu/ Computers 8 Operations Research i(Im)Il-l hresholds of a3 as denoted by (a1, 2\(o3 Thus, the classification 3. 2. Derivation of relations among users and items--CECF rules will provide exclusive groups so that one item belongs to only one group. Properties of each item-group could be also easily In the proposed offline database, the framework of bipartite prices of items in the market basket are defined as embedded in the framework are regarded as clique effects of as c=[cld=1,2 , i=1, 2,..., n where sa and ca represent effects result mainly from the grouping of users. Users ique 1, 2,..., ] possible profits are defined the purchase probability measured for get user. The the corresponding price and profit of Pd. Therefore, for the items same clique with the target user(the so-called neighbors in CE database, it will be stored by each item-group with its items and could provide collaborative information to measure purchase specified properties probability. However, users in different cliques may also provide collaborative information to the target user to a certain degree. In his respect, we propose the following concept to measure the 3. 1.2. User-groups with their profile purchase probability of the target user with respect to a predicted Denote a user as u with fEN. Let U=(U(Og)U E N) be a set of Item users labeled by the demographic features og E(o1, 02 Pruser, item)=0.Pruse, item, +(1-0). Pruser, item. (1) g,..., ODG). To facilitate analysis--providing solutions for the new user"problem and exploiting clique effects, the users are where the probability Pruser, item, is a convex combination of two classified into mutually exclusive user-groups and assumed to distinct probabilities: one is the purchase probability predicted by behave similarly as the DF method suggests. The user-groups are collaboration of users in the same clique(the neighbors)with the formed by the following rules: assume each demographic feature target user, and the other is predicted by collaboration of users in Og could be divided/categorized into vg intervals/categories the different cliques. The composition of the proposed probability denoted by a", and then we define u:U→o21×o2…xob measure is illustrated in Fig 3. ve have u={u(og)og∈og,g=1,2,…,j=1,2,…J.Then Let us refer to Fig. 3. First, note that arrows 3 and 4 jointly each user-group could be represented as U=fup(og)I represent the"in-clique"purchase probability measure used by 卩=1,2,,F=12.,,|U=F U U=U. For convention al fh. hine commont concept of the 1 i method with instance, we define the demographic features to be gender pincligue=K, 2 sim(up p)x Cupe-Pa (o1)and age (o): @1 is categorized into vi=2 categories as male and female: @2 is divided into v2=4 intervals (O.20).(20, 30). 130.40. 140, co). Then we define the user- where Pinda ,'ing t to ensure the absolute values of probability u is the probability that target user up purchases groups as Up: U-01 x o2 and eight user-groups yield as U,j=1,2,,8. sum to unity; sim(up, ur), which refers to arrow 4, is the similarity Relations between the target user ug and the neighbors uft; and G which refers to arrow 3, is the binary choice whether a user uft purchases pa or not. It is noteworthy that for the similarity measure between the target user up and the neighbors ufr, as specified in Eg. (2), the neighbors are chosen from the user-group to which the target user belongs: this is in compliance with the structure of our proposed Rs, which assumes that users in the same demographic group would tend to behave similarly. Second for the probability measure of"out-of-clique"based on the concept of Cf, two factors should be considered: (1) the similarity between the target user-group and other user-groups as Fig. 2. Framework of relations among user-groups and item-groups. well as(2)other user-groups purchase priorities on the predicted Purchase prority among user-groups and sim(U, U binary choice whether a ure between the target Fig 3. Various probability measurements of the target user on the predicted item. Please cite this article H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-com Computers and Operations Research(2010). doi: 10. 1016/j. cor. 2010.03.011
ARTICLE IN PRESS thresholds of a3 as denoted by fa1,a2g\fa3g. Thus, the classification rules will provide exclusive groups so that one item belongs to only one group. Properties of each item-group could be also easily and clearly identified by observing attribute labels. The selling prices of items in the market basket are defined as s ¼ ½sdi jdi ¼ 1i ,2i , ... ,Di ,i ¼ 1,2, ... ,I; possible profits are defined as c ¼ ½cdi jdi ¼ 1i ,2i , ... ,Di ,i ¼ 1,2, ... ,I where sdi and cdi represent the corresponding price and profit of pdi . Therefore, for the items database, it will be stored by each item-group with its items and specified properties. 3.1.2. User-groups with their profiles Denote a user as uf with fAN. Let U ¼ fufðog Þjf ANg be a set of users labeled by the demographic features og Afo1,o2, ... ,og, ... ,oGg. To facilitate analysis—providing solutions for the ‘‘new user’’ problem and exploiting clique effects, the users are classified into mutually exclusive user-groups and assumed to behave similarly as the DF method suggests. The user-groups are formed by the following rules: assume each demographic feature og could be divided/categorized into ng intervals/categories denoted by ong g , and then we define Uj : U-on1 1 on2 2 onG G , we have Uj ¼ fufðogÞjog Aong g , g ¼ 1,2, ... ,G,j ¼ 1,2, ... ,Jg. Then each user-group could be represented as Uj ¼ fufjðogÞj fj ¼ 1j ,2j , ... , Fj ,j ¼ 1,2, ... ,Jg, jUj j ¼ Fj and thus S J i ¼ 1 Uj ¼ U. For instance, we define the demographic features to be gender (o1) and age (o2); o1 is categorized into n1 ¼ 2 categories as male and female; o2 is divided into n2 ¼ 4 intervals as (0, 20], [20,30], [30,40], [40, N). Then we define the usergroups as Uj : U-o2 1 o4 2 and eight user-groups yield as Uj ,j ¼ 1,2, ... ,8. 3.2. Derivation of relations among users and items—CECF In the proposed offline database, the framework of bipartite grouping connects users and items (Fig. 2). The relations embedded in the framework are regarded as clique effects of the purchase probability measured for a target user. The clique effects result mainly from the grouping of users. Users in the same clique with the target user (the so-called neighbors in CF) could provide collaborative information to measure purchase probability. However, users in different cliques may also provide collaborative information to the target user to a certain degree. In this respect, we propose the following concept to measure the purchase probability of the target user with respect to a predicted item as Prðuser, itemÞ ¼ y Pin-clique rðuser, itemÞ þ ð1yÞ Pout-of-clique rðuser, itemÞ , ð1Þ where the probability Prðuser, itemÞ is a convex combination of two distinct probabilities: one is the purchase probability predicted by collaboration of users in the same clique (the neighbors) with the target user, and the other is predicted by collaboration of users in the different cliques. The composition of the proposed probability measure is illustrated in Fig. 3. Let us refer to Fig. 3. First, note that arrows 3 and 4 jointly represent the ‘‘in-clique’’ purchase probability measure used by conventional CF. The common concept of the CF method with adaptation to the binary market basket data [6,35] is presented as Pin-clique rðufj ,pdi Þ ¼ k1 X uf t AUj simðufj ,uf t Þ Cuf t ,pdi , ð2Þ where Pin-clique rðufj ,pdi Þ is the probability that target user ufj purchases item pdi by using a collaboration of neighbors’ preferences; k1 is a normalizing factor to ensure the absolute values of probability sum to unity; simðufj ,uf t Þ, which refers to arrow 4, is the similarity between the target user ufj and the neighbors uf t ; and Cuf t ,pdi , which refers to arrow 3, is the binary choice whether a user uf t purchases pdi or not. It is noteworthy that for the similarity measure between the target user ufj and the neighbors uf t , as specified in Eq. (2), the neighbors are chosen from the user-group to which the target user belongs; this is in compliance with the structure of our proposed RS, which assumes that users in the same demographic group would tend to behave similarly. Second, for the probability measure of ‘‘out-of-clique’’ based on the concept of CF, two factors should be considered: (1) the similarity between the target user-group and other user-groups as well as (2) other user-groups’ purchase priorities on the predicted U1 U2 P1 P P 2 U Relations UserGroups ItemGroups J I Fig. 2. Framework of relations among user-groups and item-groups. : j wi ( user, item ) out-of-clique arrow 1 arrow 2 Purchase priority among user-groups and item-groups. Neighbor Neighbor Uj U1 Uj Pi Target user Predicted item Similarity measures among user-groups. Similarity measure between the target user and his/her neighbor. The binary choice whether a user purchases the item or not. arrow 3 arrow 4 r P in-clique r( user, item ) P C ( , ): f f sim u u , : u p ( , ): j sim U Uτ Fig. 3. Various probability measurements of the target user on the predicted item. 6 H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H.-F. Wang C-T. Wu/ Computers 8 Operations Research I(m)Il 7 tem-group. For the former, the similarity measures would refer to where sim(U, U) is the similarity measure between the target ow 2 in Fig 3. For the latter that refers to arrow 1 in Fig 3. the user-group U and the other user-group Ut.Therefore, the relative purchase frequency in the binary basket analysis has been similarity measures indicated in the Appendix a could be adopted as the prediction of purchase priority [10 computed as shown in Appendix B, in which the similarity measure is more appropriate. w S(UI mmary of the proposed CeCF items in P: s(U) is the total number of market baskets for U. purchase probability measure as Eq. (5), which is a convex Therefore, the probability measure of"out-of-clique"purchase combination of two distinct probability measures from in-clique n be presented effects of Eq. (2)and out-of-clique effects of Eq. (4). The Pout-o-aique=K2> sim(U,U)x wi (4) classification of the target user into in-clique users as out-of-clique users, the proposed probability measure where sim(U, U), which refers to arrow 2, is the similarity function provides different insight from that of conventional CF measure between the target user-group u and other user-group ethod U K2 is a zing factor to ensure the absolute values of As for the probability measure of in-clique users, we adopt the probability sum to unity. Therefore, the probability measure of a traditional CF method, whereas for the measure of out-of-clique target user up purchasing item Pa would be represented a sers, we propose an alternative similarity function by incorpo ting the items not purchased simultaneously by each pair of compared users to find the similarity among user-groups. Prug, Po)=p=0.(K1 2 sim(up. ue)x Cuy. Pe Then the proposed probability measure is predicted by the purchase and non-purchase behaviors of the users, which could be expected to provide more information in expounding the users. Therefore, to facilitate flexible applications, under +(1-0) (5) the proposed CECE, we have two schemes in the recommendation method, namely, CECF-C and CECF-NC. C and NC represent where the probability measure Prupd-Pg) is replaced by a" for the choice of similarity functions applied in computing the simplicity: and 0 is an adjustable weight on the in-clique probab similarities among user-groups. C is based on the Com measure The way of the probability measure in Eq (5)would tem set. whereas Nc is based on Non-Common item set. it is worthy to discuss the hybrid of C and NC in measuring similarities us into the consideration on how to select similarity functions that the CF performance depends on the choice of similarity currently since the adjustment of weights would make the measures. Conventionally, the similarity function for market basket module more complex for analysis. Note that measuring simila- data is based on the Jaccard coefficient [10, 22, 36]as rities between in-clique users still apply the concept of common S(up)ns(ur) item set since their basket sizes are much smaller. In Table 3 S(up)US(ufe)I S(up )l+IS(uf)I-S(up)ns(ufe)I we list all recommendation schemes that would be compared in Section 4 where S(up) is the item set purchased by user up: S(upi)nS(ufe)is the common item set purchased by user up and fre, sun/os(r) 3.3. The analytical model and recommendation procedures is the item set purchased by user up or uft. However, as indicated In this section would discuss the analytical m [36], the Jaccard coefficient missed the information that two proposed by Wang and Wu [51] as well as the operat users do not choose the same items simultaneously. The non procedures of the proposed module mmon item set would affect the similarity measure between influence of non-common item set into consideration. Therefore. 3.3.1. The analytical model with two marketing strategies: maximal on the grounds of effects caused by non-common item set profit strategy and win-win strategy After the offline operations, three databases were constructed measure between two users based on the similarity function namely item-group database defined by Pi=(Pa(ak)d'=1,22 considering nor item set as D', i=1,2,. I]: user-group database defined by U=(up(og) f=1,2,., Fij=1, 2,.JI; their relations constructed by CECF sim(p, pr)= sup)n S(uy )I () of Eqs. (4 -(6),and (8). When a user is online, we could identify S(up)US(uf) a user s eleven ces through the corresponding information retrieved from the databases the retrieved data as well as the S()represents the non-purchased item set and the comple- users requests(desired satisfaction level and budget limit)are et of S(). Consequently, Eq (7)preserves the information of that are not commonly purchased by two compared users. However, the similarity measure of the non-common item set is not very appropriate in a large-scale database. The reason is Table 3 nat the value of this indicator would be probably close to one Recommendation schemes. hen comparing two users(see Appendix A). As a consequence. we suggest that they are compared on the grounds of group Schemes Function of user similarity tion of user-group similarity In-clique effects purchase behavior, which is given as -of-clique eff (U,U) lU-, S(U)-(S(U)US(U") Common item set lUi-1S(U)-(S(U)nS(UT) CECF-NC Non-common item se Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research(2010). doi: 10. 1016 j cor. 2010.03.011
ARTICLE IN PRESS item-group. For the former, the similarity measures would refer to arrow 2 in Fig. 3. For the latter that refers to arrow 1 in Fig. 3, the relative purchase frequency in the binary basket analysis has been adopted as the prediction of purchase priority [10]: wj i ¼ CðUj ,Pi Þ SðUj Þ , ð3Þ where CðPi ,Uj Þ is the relative frequency that users in Uj purchase items in Pi ; S(Uj ) is the total number of market baskets for Uj . Therefore, the probability measure of ‘‘out-of-clique’’ purchase can be presented as Pout-of-clique rðufj ,pdi Þ ¼ k2 X taj simðUj ,UtÞ wt i , ð4Þ where simðUj ,UtÞ, which refers to arrow 2, is the similarity measure between the target user-group Uj and other user-group Ut; k2 is a normalizing factor to ensure the absolute values of probability sum to unity. Therefore, the probability measure of a target user ufj purchasing item pdi would be represented as Prðufj ,pdi Þ ¼ @ ufj pdi ¼ y k1 X uf t AUj simðufj ,uf t Þ Cuf t ,pdi zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{ in-clique þ ð1yÞ k2 X taj simðUj ,UtÞ wt i zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{ out-of-clique , ð5Þ where the probability measure Prðufj ,pdi Þ is replaced by @ ufj pdi for simplicity; and y is an adjustable weight on the in-clique probability measure. The way of the probability measure in Eq. (5) would lead us into the consideration on how to select similarity functions. Note that the CF performance depends on the choice of similarity measures. Conventionally, the similarity function for market basket data is based on the Jaccard coefficient [10,22,36] as simðufj ,uf t Þ ¼ jSðufjÞ \ Sðuf t Þj jSðufjÞ [ Sðuf t Þj ¼ jSðufjÞ \ Sðuf t Þj jSðufjÞjþjSðuf t ÞjjSðufjÞ \ Sðuf t Þj , ð6Þ where SðufjÞ is the item set purchased by user ufj; SðufjÞ \ Sðuf t Þ is the common item set purchased by user ufj and uf t ; SðufjÞ [ Sðuf t Þ is the item set purchased by user ufj or uf t . However, as indicated in [36], the Jaccard coefficient missed the information that two users do not choose the same items simultaneously. The noncommon item set would affect the similarity measure between two objects; as a result, the similarity function shall take the influence of non-common item set into consideration. Therefore, on the grounds of effects caused by non-common item set between users’ purchase histories, we propose the similarity measure between two users based on the similarity function considering non-common item set as simðufj ,uf t Þ ¼ jSðufjÞ \ Sðuf t Þj jSðufjÞ [ Sðuf t Þj , ð7Þ where SðÞ represents the non-purchased item set and the complement set of SðÞ. Consequently, Eq. (7) preserves the information of items that are not commonly purchased by two compared users. However, the similarity measure of the non-common item set is not very appropriate in a large-scale database. The reason is that the value of this indicator would be probably close to one when comparing two users (see Appendix A). As a consequence, we suggest that they are compared on the grounds of group purchase behavior, which is given as simðUj ,UtÞ ¼ j SJ j ¼ 1 SðUj ÞðSðUj Þ [ SðUtÞÞj j SJ j ¼ 1 SðUj ÞðSðUj Þ \ SðUtÞÞj , ð8Þ where simðUj ,UtÞ is the similarity measure between the target user-group Uj and the other user-group Ut. Therefore, the similarity measures indicated in the Appendix A could be computed as shown in Appendix B, in which the similarity measure is more appropriate. 3.2.1. Summary of the proposed CECF In this section, we have proposed the CECF containing users’ purchase probability measure as Eq. (5), which is a convex combination of two distinct probability measures from in-clique effects of Eq. (2) and out-of-clique effects of Eq. (4). The classification of the target user into in-clique users as well as out-of-clique users, the proposed probability measure function provides different insight from that of conventional CF method. As for the probability measure of in-clique users, we adopt the traditional CF method, whereas for the measure of out-of-clique users, we propose an alternative similarity function by incorporating the items not purchased simultaneously by each pair of compared users to find the similarity among user-groups. Then the proposed probability measure is predicted by the purchase and non-purchase behaviors of the users, which could be expected to provide more information in expounding the users. Therefore, to facilitate flexible applications, under the proposed CECF, we have two schemes in the recommendation method, namely, CECF-C and CECF-NC. C and NC represent the choice of similarity functions applied in computing the similarities among user-groups. C is based on the Common item set, whereas NC is based on Non-Common item set. It is worthy to discuss the hybrid of C and NC in measuring similarities among user-groups. We would not focus on a hybrid approach currently since the adjustment of weights would make the module more complex for analysis. Note that measuring similarities between in-clique users still apply the concept of common item set since their basket sizes are much smaller. In Table 3, we list all recommendation schemes that would be compared in Section 4. 3.3. The analytical model and recommendation procedures In this section, we would discuss the analytical model proposed by Wang and Wu [51] as well as the operation procedures of the proposed module. 3.3.1. The analytical model with two marketing strategies: maximal profit strategy and win–win strategy After the offline operations, three databases were constructed, namely item-group database defined by Pi ¼ fpdiðakÞjdi ¼ 1i ,2i , ... , Di ,i ¼ 1,2, ... ,Ig; user-group database defined by Uj ¼ fufjðogÞj fj ¼ 1j ,2j , ... ,Fj ,j ¼ 1,2, ... ,Jg; their relations constructed by CECF of Eqs. (4)–(6), and (8). When a user is online, we could identify a user’s preferences through the corresponding information retrieved from the databases. The retrieved data as well as the user’s requests (desired satisfaction level and budget limit) are Table 3 Recommendation schemes. Schemes Function of user similarity Function of user-group similarity In-clique effects Out-of-clique effects CF Common item set – CECF-C Common item set Common item set CECF-NC Common item set Non-common item set H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] 7 Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H -F Wang, C-T. Wu/Computers 8 Operations Research i(am)Il-l regarded as the users metadata input into the analytical model, weighting parameter B, Be[o, 1], Model (11) can be transformed which has been proposed by Wang and Wu as shown in Eqs. into a single objective programming model as Model (12). while (9.1)-(9.5) Model(11) with B=l yields Model (10)for implementing maximal profit strategy; that with B=0 will emphasize the users Maximize (9.1) benefit as best service strategy: and depend on the marketing preference the suppliers adopted, B can be given by any values Subject to between 0 and 1 as win-win strategy. Note that in Model (12),c"is further normalized from c into [0, 1 to match the same scale =1,2,F,j=1,2.J sx≤B,=1,2,.F,j=1.2.J Maximize 石21.P=12…只,1=12… (94) Maximize∑>ax Subject to sxsB, f=1, 2, ., F, j=1,2,.. ∈ F,j=12,…,x∈0,1)(11) whex=bpx1=12…Ld12…D,j=12,…J fi=1,2..F,e=l if item pa is recommended to ul";otherwise, xa=0. c and s are the corresponding profit and price of pd. bis the satisfactory level requested by u, Bf is the budget limit given by Maximize +(1-B) I'.a=[ lxyp, pe to be the purchase probability measure of B",f卩=1,2,p user up on Pa. This model maximizes the profits of an EC company (9.1)when the items recommended to users satisfy their satisfactory xn≥1.f=y,2,p.j=1,2..,x∈01.(12) level as shown in constraint(9. 2); the total prices spent on the items should not exceed the budget of the user as shown in constraint (9.3). Constraint (9.4) provides a tool for strategic uses by recommending different number of items of which at least one 3.3.2. Measures of recommendation performance item should be recommended to a user at each time To evaluate the performance of information retrieval, three or different marketing strategies-the maximal profit strategy and The sures of recall, precision, and F1 are usually employed [12,471 Under the basic model, two strategies could be provide win-win strategy. When the recommending processes use only the recommendation system as well supplier viewpoint, the goal will be to maximize the profits of the oods under a set of items that satisfy the users' preferences Recall= S(user)n Rec(user)l/Rec(user)l (13) and budgets. When this is intended, denote the reduced decision- variable vector and the corresponding coefficients by""to mean that Precision= S(user) n Rec(user)//S(user) all items left for consideration are at least above the requested F1=2 x Recall x Precision/Recall+Precision, satisfactory level, namely b". Model (10)will immediately reflect such strategy. where S(user) is the actual basket for the compared user; Red(user) is the recommendation item set. Recall is the ratio of Maximize items successfully recommended, whereas precision measures he user's satisfactory degree. Fl is a leverage measure when Subject to s'x"≤B,∫=1,2,F,j=1,2,J recall and precision conflict with each other. ∑∑21,=12,p,J=12.k0 3.3.3. Summary of offline and online operation procedures After introducing the individual sub-modules of the proposed profit strategy will bring about the highest online operations procedures are categorized int from the management passively satis desires to the minimal levels and thus 3.3.3.1. Offline operatic ot a strategy to provide good services. Alternatively, the win-win Step 1. Construct user-groups through user's demographic strategy which actively takes both suppliers' profit and users features and item-groups by obtai preferences into account is proposed. Model (11) realizes such UD={u(cg)=1,2,…,Fj=1,2…Jand trategy in which the first objective function maximizes the suppliers profit as previously done: meanwhile, the second P={P(xk)d=1,2,,Dy,i=12, satisfaction.Made/ represents the maximization of the user's Step 2. Compute relative purchase priorities(w/) between user- (11)is a bi-objective programming mo groups and item-groups by Eq (3). Since there are a lot of prominent literatures discussing and Step 3. Compute similarity measures between user-groups. solving this kind of bi-criterion problems [ 1, 4,8, 13, 50] we do not Similarity function is used from common item set(Eq (6))or focus on how to solve the proposed models. In the manner non-common item set(Eq (8)). of convex combination of the two objectives: introducing a Step 4. Derive out-of-clique probability measures by Eq (4). Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-com Computers and Operations Research(2010). doi: 10. 1016/j. cor. 2010.03.011 mmerce
ARTICLE IN PRESS regarded as the user’s metadata input into the analytical model, which has been proposed by Wang and Wu as shown in Eqs. (9.1)–(9.5): Maximize X J j ¼ 1 XFj fj ¼ 1 cxfj , ð9:1Þ Subject to afj xfj Zbfj , fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J, ð9:2Þ sxfj rBfj , fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J, ð9:3Þ XI i ¼ 1 XDi di ¼ 1 x fj idi Z1, fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J, ð9:4Þ x fj idi Af0,1g, ð9:5Þ where xfj ¼ ½x fj di P i Di1, i ¼ 1,2, ... ,I, di Af1,2, ... ,Di g, j ¼ 1,2, ... ,J, fj ¼ 1j ,2j , ... ,Fj , xfj di ¼ 1 if item pdi is recommended to ufj ; otherwise, x fj di ¼ 0. c and s are the corresponding profit and price of pdi . bfj is the satisfactory level requested by ufj ; Bfj is the budget limit given by ufj . aj ¼ ½@ ufj pdi 1 P i Di , @ ufj pdi to be the purchase probability measure of user ufj on pdi . This model maximizes the profits of an EC company (9.1) when the items recommended to users satisfy their satisfactory level as shown in constraint (9.2); the total prices spent on the items should not exceed the budget of the user as shown in constraint (9.3). Constraint (9.4) provides a tool for strategic uses by recommending different number of items of which at least one item should be recommended to a user at each time. Under the basic model, two strategies could be provided for different marketing strategies—the maximal profit strategy and win–win strategy. When the recommending processes use only the supplier viewpoint, the goal will be to maximize the profits of the goods under a set of items that satisfy the users’ preferences and budgets. When this is intended, denote the reduced decisionvariable vector and the corresponding coefficients by ‘‘0 ’’ to mean that all items left for consideration are at least above the requested satisfactory level, namely bfj . Model (10) will immediately reflect such strategy. Maximize X J j ¼ 1 XFj fj ¼ 1 c0 xfj 0 Subject to s0 xfj 0 rBfj , fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J XI i ¼ 1 XDi di0 ¼ 1 xfj idi 0 Z1, fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J, xfj idi 0 Af0,1g ð10Þ Although maximal profit strategy will bring about the highest income to the suppliers, from the management viewpoint, it only passively satisfies users’ desires to the minimal levels and thus is not a strategy to provide good services. Alternatively, the win–win strategy which actively takes both suppliers’ profit and users’ preferences into account is proposed. Model (11) realizes such strategy in which the first objective function maximizes the supplier’s profit as previously done; meanwhile, the second objective function represents the maximization of the user’s satisfaction. Model (11) is a bi-objective programming model. Since there are a lot of prominent literatures discussing and solving this kind of bi-criterion problems [1,4,8,13,50] we do not focus on how to solve the proposed models. In the manner of convex combination of the two objectives: introducing a weighting parameter b, bA½0,1, Model (11) can be transformed into a single objective programming model as Model (12). While Model (11) with b¼1 yields Model (10) for implementing maximal profit strategy; that with b¼0 will emphasize the users’ benefit as best service strategy; and depend on the marketing preference the suppliers adopted, b can be given by any values between 0 and 1 as win–win strategy. Note that in Model (12), c00 is further normalized from c0 into [0, 1] to match the same scale with aj 0 . Maximize X J j ¼ 1 XFj fj ¼ 1 c0 xfj 0 Maximize X J j ¼ 1 XFj fj ¼ 1 aj 0 xfj 0 Subject to s0 xfj 0 rBfj , fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J XI i ¼ 1 XDi di 0 ¼ 1 x fj idi 0 Z1, fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J, x fj idi0 Af0,1g ð11Þ Maximize b 0 @X J j ¼ 1 XFj fj ¼ 1 c 00 xfj 0 1 Aþ ð1bÞ X J j ¼ 1 XFj fj ¼ 1 aj 0 xfj 0 Subject to s0 xfj 0 rBfj , fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J XI i ¼ 1 XDi di 0 ¼ 1 x fj idi 0 Z1, fj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,J, x fj idi0 Af0,1g, ð12Þ 3.3.2. Measures of recommendation performance To evaluate the performance of information retrieval, three measures of recall, precision, and F1 are usually employed [12,47]. They are defined as follows and will be used to evaluate our recommendation system as well. Recall ¼ jSðuserÞ \ RecðuserÞj=jRecðuserÞj, ð13Þ Precision ¼ jSðuserÞ \ RecðuserÞj=jSðuserÞj, ð14Þ FI ¼ 2 Recall Precision=RecallþPrecision, ð15Þ where S(user) is the actual basket for the compared user; Rec(user) is the recommendation item set. Recall is the ratio of items successfully recommended, whereas precision measures the user’s satisfactory degree. F1 is a leverage measure when recall and precision conflict with each other. 3.3.3. Summary of offline and online operation procedures After introducing the individual sub-modules of the proposed RS, we would summarize the operation procedures for the proposed RS. The procedures are categorized into offline and online operations. 3.3.3.1. Offline operation procedures. Step 1. Construct user-groups through user’s demographic features and item-groups by item attributes to obtain Uj ¼ fufjðogÞjfj ¼ 1j ,2j , ... ,Fj ,j ¼ 1,2, ... ,Jg and Pi ¼ fpdiðakÞjdi ¼ 1i ,2i , ... ,Di ,i ¼ 1,2, ... ,Ig. Step 2. Compute relative purchase priorities ðwj i Þ between usergroups and item-groups by Eq. (3). Step 3. Compute similarity measures between user-groups. Similarity function is used from common item set (Eq. (6)) or non-common item set (Eq. (8)). Step 4. Derive out-of-clique probability measures by Eq. (4). 8 H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H.-F. Wang C-T. Wu/Computers 8 Operations Research I(Im)I 3.3.3.2. Online operation procedures 4.1. A case study with experiments Step 1. Set up parameters on in-clique effects(0)and profit In the experiments, 227 customers are divided randomly into 1.1. Maximal profit strategy, set B=1: 20%/80% as testing and training data in an echo. We would 1. 2. Win-win strat conduct three experiments with different goals. In the first 1.3. Best service strategy, set B=0 experiment, we shall compare the recommendation performance Step 2. On-line inquiry of target users'profiles of demographic of conventional CF with our proposed recommendation approach features(up(og)), binary basket data( Curt. Pa ) the desired CECF in two cases of CECF-C and CECF-NC and the three schemes satisfaction level(b), and the budget limit(Br') are all with a fixed neighborhood size of 20. In the second Step 3. Classify target user into proper user-group by experiment, we would compare the recommendation perfor mance as well as the profit gained with respect to suppliers U={up(og)=1,2 market strategies as: (1)B=l yields maximal profit strategy and 3.1. A historical user with basket data(0<0s 1)compute (2)Be(0, 1) yields the win-win strategy, (3)B=0 emphasizes the urchase probabilities on each item with CECF-C(Eq(6)) customers benefit of the best service strategy. In the third CECF-NC(Eq (8)). experiment, we compare the sensitive Fl values with respect to 3. 2. A new user without basket data(0=O)retrieve out-of- the neighborhood sizes(3, 5. 7, 10, and 20)under three schemes clique probability measures as purchase probability on each of CF, CECF-NC with profit consideration (B=0. 2)and CECF-Nc with non-profit consideration. Step 4. Derive metadata from purchase probabilities(Eq (5) Three measures of recall, precision and F1 equest as input to Step 5. evaluation. Different values of parameters chosen to Step 5. Run the analytical model and yield recommendation demonstrate their impacts as sensitivity analysis. We pick one of the echoes for illustration in the following section. All experimental procedures would be shown in compliance with the procedures proposed in Section 3.3.3 (Table 4-6). 4. Case study: laptops Rs of a 3C retailer Offline operation procedures(training data) 3C industries of Taiwan have the most advanced technologies in the world. Among various electronic products, the experiments ur proposed rs are conducted specifically with la because of three reasons.( 1) Laptop transactions are usually Purchase probabilities of new users by cECF-NC(0=0) fewer than those of other electronic products so introducing an RS would be meaningful to attract the users; (2)fewer transactions re difficult to exploit when introducing the rS, so our pro 0007000070000080.0008 RS aims to solve this situation by incorporating clique effects; and pops are all highly priced so that the profit consi 0000 4h Following the provided data of a 3C retailer. the prototype of us 0.0028 0.0028 0.0028 0.00840.00840.00060.0006 would be system was established and evaluated in this section by first escribing the given database: the laptop data set contains 915 market baskets including 227 customers and 192 items. The rpes of items in the basket are ranged from two to eight for each Table 6 er. The users information is revealed by user types(define Purchase probabilities of new users by CECF-C(0=0). h users' demographic features by the 3C retailer)and five user- groups are yielded (U, U. U. U. UP). The item attributes(k)are PnB四 denoted as: (1)central processing unit( CPU). (2)random-access 0.0025 32 exclusive groups. Due to incomplete data, there are only oe oe an 0.000 00240.0024 0.0005 Table 4 Out-of-clique probability measures. p U20003008200050222000500600365000100540029002900540026000700341 U20.00 0.244 0.0870.00527800210.0620.32600 002200220072001 0.2700.018 0.024 0060022 U500000.08800060264001600580.33100010055001700200070002200060029 P P3 P p Tot 0003 0.2190.0050.0600.36 00290.0550025000600351 0.254 0.347 U000 0.2390.012 90.2330.0080.0570.3600.0010059002200230067002600040.0361 Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research(2010). doi: 10. 1016 j cor. 2010.03.011
ARTICLE IN PRESS 3.3.3.2. Online operation procedures. Step 1. Set up parameters on in-clique effects (y) and profit consideration (b) if adopt 1.1. Maximal profit strategy, set b¼1; 1.2. Win–win strategy, set a bAð0,1Þ; 1.3. Best service strategy, set b¼0. Step 2. On-line inquiry of target users’ profiles of demographic features ðufjðogÞÞ, binary basket data ðCuf t ,pdi Þ; the desired satisfaction level ðbfj Þ, and the budget limit ðBfj Þ. Step 3. Classify target user into proper user-group by Uj ¼ fufjðogÞjfj ¼ 1j ,2j , ... ,Fj , j ¼ 1,2, ... ,Jg. 3.1. A historical user with basket data (0oyr1)—compute purchase probabilities on each item with CECF-C (Eq. (6)) or CECF-NC (Eq. (8)). 3.2. A new user without basket data (y¼0)—retrieve out-ofclique probability measures as purchase probability on each item. Step 4. Derive metadata from purchase probabilities (Eq. (5)) and user’s request as input to Step 5. Step 5. Run the analytical model and yield recommendation list. 4. Case study: laptops RS of a 3C retailer 3C industries of Taiwan have the most advanced technologies in the world. Among various electronic products, the experiments of our proposed RS are conducted specifically with laptops because of three reasons. (1) Laptop transactions are usually fewer than those of other electronic products so introducing an RS would be meaningful to attract the users; (2) fewer transactions are difficult to exploit when introducing the RS, so our proposed RS aims to solve this situation by incorporating clique effects; and (3) laptops are all highly priced so that the profit consideration would be more applicable. Following the provided data of a 3C retailer, the prototype of the system was established and evaluated in this section by first describing the given database; the laptop data set contains 915 market baskets including 227 customers and 192 items. The types of items in the basket are ranged from two to eight for each user. The user’s information is revealed by user types (defined with users’ demographic features by the 3C retailer) and five usergroups are yielded (U1 , U2 , U3 , U4 , U5 ). The item attributes (k) are denoted as: (1) central processing unit (CPU), (2) random-access memory (RAM), (3) brand, (4) storage capacity, and (5) weight. By our classification rules with K¼5, the item-groups consist of 32 exclusive groups. Due to incomplete data, there are only 16 non-empty item-groups. 4.1. A case study with experiments In the experiments, 227 customers are divided randomly into 20%/80% as testing and training data in an echo. We would conduct three experiments with different goals. In the first experiment, we shall compare the recommendation performance of conventional CF with our proposed recommendation approach CECF in two cases of CECF-C and CECF-NC, and the three schemes are all with a fixed neighborhood size of 20. In the second experiment, we would compare the recommendation performance as well as the profit gained with respect to supplier’s market strategies as: (1) b¼1 yields maximal profit strategy and (2) bAð0,1Þ yields the win–win strategy, (3) b¼0 emphasizes the customer’s benefit of the best service strategy. In the third experiment, we compare the sensitive F1 values with respect to the neighborhood sizes (3, 5, 7, 10, and 20) under three schemes of CF, CECF-NC with profit consideration (b¼0.2) and CECF-NC with non-profit consideration. Three measures of recall, precision and F1 will be used for evaluation. Different values of parameters were chosen to demonstrate their impacts as sensitivity analysis. We pick one of the echoes for illustration in the following section. All experimental procedures would be shown in compliance with the procedures proposed in Section 3.3.3 (Table 4–6). Offline operation procedures (training data) Table 4 Out-of-clique probability measures. NC P1 P2 P3 P4 P5 P6 P7 P8 P9 P11 P12 P13 P14 P15 P16 Total U1 0.003 0.082 0.005 0.222 0.005 0.060 0.365 0.001 0.054 0.029 0.029 0.054 0.026 0.007 0.034 1 U2 0.003 0.085 0.004 0.244 0.019 0.053 0.335 0.000 0.060 0.024 0.020 0.072 0.023 0.006 0.030 1 U3 0.003 0.087 0.005 0.278 0.021 0.062 0.326 0.001 0.057 0.022 0.022 0.072 0.012 0.000 0.021 1 U4 0.004 0.095 0.001 0.270 0.018 0.062 0.323 0.001 0.050 0.024 0.024 0.063 0.016 0.006 0.022 1 U5 0.000 0.088 0.006 0.264 0.016 0.058 0.331 0.001 0.055 0.017 0.020 0.070 0.022 0.006 0.029 1 C P1 P2 P3 P4 P5 P6 P7 P8 P9 P11 P12 P13 P14 P15 P16 Total U1 0.003 0.081 0.005 0.219 0.005 0.060 0.366 0.001 0.055 0.029 0.029 0.055 0.025 0.006 0.035 1 U2 0.004 0.077 0.005 0.216 0.013 0.054 0.359 0.000 0.063 0.031 0.025 0.065 0.025 0.005 0.034 1 U3 0.004 0.081 0.006 0.254 0.015 0.062 0.347 0.001 0.058 0.027 0.026 0.066 0.015 0.000 0.025 1 U4 0.007 0.088 0.001 0.239 0.012 0.063 0.349 0.001 0.053 0.034 0.031 0.054 0.016 0.004 0.024 1 U5 0.000 0.077 0.009 0.233 0.008 0.057 0.360 0.001 0.059 0.022 0.023 0.067 0.026 0.004 0.036 1 Table 5 Purchase probabilities of new users by CECF-NC (y¼0). p2 1 p2 2 p2 13 p4 2 p4 3 p16 5 p16 6 u1 1 0.0026 0.0026 0.0026 0.0070 0.0070 0.0008 0.0008 u2 1 0.0027 0.0027 0.0027 0.0078 0.0078 0.0007 0.0007 u3 1 0.0028 0.0028 0.0028 0.0088 0.0088 0.0004 0.0004 u4 1 0.0031 0.0031 0.0031 0.0087 0.0087 0.0007 0.0007 u5 1 0.0028 0.0028 0.0028 0.0084 0.0084 0.0006 0.0006 Table 6 Purchase probabilities of new users by CECF-C (y¼0). p2 1 p2 2 p2 13 p4 2 p4 3 p16 5 p16 6 u1 1 0.0025 0.0025 0.0025 0.0069 0.0069 0.0008 0.0008 u2 1 0.0024 0.0024 0.0024 0.0068 0.0068 0.0007 0.0007 u3 1 0.0025 0.0025 0.0025 0.0079 0.0079 0.0004 0.0004 u4 1 0.0028 0.0028 0.0028 0.0076 0.0076 0.0008 0.0008 u5 1 0.0024 0.0024 0.0024 0.0073 0.0073 0.0005 0.0005 H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] 9 Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011
ARTICLE IN PRESS H.-F. Wang, C-T. Wu/ Computers 8 Operations Research i(Im)Il-l tep 1 (1)Construct user-groups, U j=1,2,.5.(UUI=182 (2270.8=182) CECF-NC Recall CF Recall item-groups, Pl, i=1,2. Step 2. Compute relative purchase priority w 0 Step 3. Compute similarity measures between user-groups by Common item set function i. e Eq (6)and Non-Common item 0 set function i.e. Eq. (8). 00.10.20.40.50.60.8090.95 Step 4. Derive out-of-clique probability measures by Eq (4)as shown in Table 4. Note that the probability measures in each ow are normalized and ensured that they sum up to 1 PCECF-NC Precision CF Precision Step 1. Set up parameters on in-clique effects(0)and profit consideration(B), respectively. For implementation, the system could set up e and B as arbitrary values. In the 00.10.20.40.50.60.809095 experiments, we set up 0 to be 0, 0. 1. 0.2,..., 1 and B to be 0. 0.2.0.4.... 1 for testing. Step 2. The users are tested as new users or historical users by setting 0=0 or 0< 0s 1, respectively Satisfaction levels(b) -rCECF-NC FI -CF FI are also defined to be 0.7, 0.8, 0.9 for experiments. Budget limits(B)are set arbitrary values that are lower than the 0.4 ummation of all items' price Step 3. Classify target user into one user-group by (up(og)U FJ=1,2,,5 0 Step 3. 1. The situation is simulated in a manner where some 00.10.20.40.50.60.809095 historical users are recommended when we set 0<0<1 tep 3. 2. The situation is simulated wherein some new users Fig. 4. Comparison of CECF-NC and CI (ul. u2,.. )are recommended by CECF-NC or CECF-C espectively when we set 0=0, which is shown in Tables 5 recall, precision, and F1 under sample values 0, with a neighborhood size of 20. Note that when 0=1, CECF-C and CECF- the probability measures for him/ her could be only derived NC both become the CF since out-of-clique effects no longer exist. from out-of-clique measures For instance, in Table 4, the In Table 7, the results of an average performance show that CECF-C probability of U to P is 0.025, which shall be the same with and CECF-NC perform better than CF except 0=0. In addition,it that of ui to p5 and p]6 in Table 5. The value is 0.0008 could be also observed that CECF-Nc performs slightly better than instead of 0.025 due to normalization CECF-C. In Fig. 4, CECF-NC has been compared with CF: in the Step 4 and Step 5 figure, the CECF-NC performs much better than CF in recall and F1 (p-value <0.001, 95% confidence level), and slightly better in In the two steps, the target user's metadata is obtained and fed recision to the analytical model, and the output of recommendations is then yielded. We skip the list of the recommendation results and Experiment 2. In Experiment 1, the average performance is directly compare the performance of the proposed operation better and more stable when CECF-NC and 6=0.6 are used. module by the following experiments. Therefore, we set up 0 to be 0.6 and continue experimenting on the analytical model by introducing B to be 0, 0. 2, 0. 4,..., 1 and Experiment 1. The performance of the recommendation results satisfaction level(b/)to be 0.7, 0.8, 0.9 under users' budget limits. n CECF-C, CECF-NC, and CF is shown in Table 7, with evaluation of We compare the CECF-NC with profit consideration as well non-profit consideration in terms of recall, precision and Fl as shown in Fig. 5; and the difference of profit gained in the two Table 7 cases are presented in Fig. 6. In Fig. 5. the results show that even Average performance of CECF-C, CECF-NC and CF. when profit consideration is introduced, the recommendation performance would not be poorer(p-value <0.05. 95% confidence level). In Fig. 6, the results show that profit increases along B Recall Precision F1 Recall Precision F1 0.928 .93908670925 Experiment 3. In this experiment, we compare three recommen- 96309030942 0.959 dation schemes of CF, CECF-NC with profit consideration (B=0.2) 0.945 .968091 and CECF-NC with non-profit consideration in terms of their F1 0.945 0.968 0.907 0.945 0.967 measures. Fig. 7 shows that the F1 values increase as the neighborhood size increases from 3. 5. 7, 10, to 20. In addition, 09100.945 Fig. 7 showed consistent results we obtained from the previous 0945 1(C0.457 0930 0.569 two experiments, that is, the CECF-Nc with profit/non-profit consideration outperforms conventional CF. Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-com Computers and Operations Research(2010). doi: 10. 1016/j. cor. 2010.03.011
ARTICLE IN PRESS Step 1. (1) Construct user-groups, Uj , j¼1, 2, y, 5, j[jUj j ¼ 182. (227*0.8ffi182) (2) Construct item-groups, Pi , i¼1, 2, y, 16, j[jPi j ¼ 192 Step 2. Compute relative purchase priority wj i . Step 3. Compute similarity measures between user-groups by Common item set function i.e. Eq. (6) and Non-Common item set function i.e. Eq. (8). Step 4. Derive out-of-clique probability measures by Eq. (4) as shown in Table 4. Note that the probability measures in each row are normalized and ensured that they sum up to 1. Online operation procedures (testing data) Step 1. Set up parameters on in-clique effects (y) and profit consideration (b), respectively. For implementation, the system could set up y and b as arbitrary values. In the experiments, we set up y to be 0, 0.1, 0.2,y,1 and b to be 0, 0.2, 0.4,y,1 for testing. Step 2. The users are tested as new users or historical users by setting y¼0 or 0oyr1, respectively. Satisfaction levels ðbfj Þ are also defined to be 0.7, 0.8, 0.9 for experiments. Budget limits ðBfj Þ are set arbitrary values that are lower than the summation of all items’ prices. Step 3. Classify target user into one user-group by Uj ¼ fufjðogÞjfj ¼ 1j ,2j , ... ,Fj ,j ¼ 1,2, ... ,5g. Step 3.1. The situation is simulated in a manner where some historical users are recommended when we set 0oyr1. Step 3.2. The situation is simulated wherein some new users ðu1 1,u1 2, ...Þ are recommended by CECF-NC or CECF-C respectively when we set y¼0, which is shown in Tables 5 and 6. Note that when a target user is regarded as a new user, the probability measures for him/her could be only derived from out-of-clique measures. For instance, in Table 4, the probability of U1 to P16 is 0.025, which shall be the same with that of u1 1 to p16 5 and p16 6 in Table 5. The value is 0.0008 instead of 0.025 due to normalization. Step 4 and Step 5. In the two steps, the target user’s metadata is obtained and fed to the analytical model, and the output of recommendations is then yielded. We skip the list of the recommendation results and directly compare the performance of the proposed operation module by the following experiments. Experiment 1. The performance of the recommendation results on CECF-C, CECF-NC, and CF is shown in Table 7, with evaluation of recall, precision, and F1 under sample values y, with a neighborhood size of 20. Note that when y¼1, CECF-C and CECFNC both become the CF since out-of-clique effects no longer exist. In Table 7, the results of an average performance show that CECF-C and CECF-NC perform better than CF except y¼0. In addition, it could be also observed that CECF-NC performs slightly better than CECF-C. In Fig. 4, CECF-NC has been compared with CF; in the figure, the CECF-NC performs much better than CF in recall and F1 (p-valueo0.001, 95% confidence level), and slightly better in precision. Experiment 2. In Experiment 1, the average performance is better and more stable when CECF-NC and y¼0.6 are used. Therefore, we set up y to be 0.6 and continue experimenting on the analytical model by introducing b to be 0, 0.2, 0.4,y,1 and satisfaction level ðbfj Þ to be 0.7, 0.8, 0.9 under users’ budget limits. We compare the CECF-NC with profit consideration as well as non-profit consideration in terms of recall, precision and F1 as shown in Fig. 5; and the difference of profit gained in the two cases are presented in Fig. 6. In Fig. 5, the results show that even when profit consideration is introduced, the recommendation performance would not be poorer (p-valueo0.05, 95% confidence level). In Fig. 6, the results show that profit increases along b increases from 0 to 1. Experiment 3. In this experiment, we compare three recommendation schemes of CF, CECF-NC with profit consideration (b¼0.2) and CECF-NC with non-profit consideration in terms of their F1 measures. Fig. 7 shows that the F1 values increase as the neighborhood size increases from 3, 5, 7, 10, to 20. In addition, Fig. 7 showed consistent results we obtained from the previous two experiments, that is, the CECF-NC with profit/non-profit consideration outperforms conventional CF. Table 7 Average performance of CECF-C, CECF-NC and CF. y CECF-NC CECF-C Recall Precision F1 Recall Precision F1 0 0.297 0.458 0.325 0.297 0.458 0.325 0.1 0.877 0.928 0.939 0.867 0.925 0.932 0.2 0.900 0.942 0.962 0.893 0.938 0.955 0.4 0.908 0.943 0.963 0.903 0.942 0.959 0.5 0.910 0.945 0.968 0.910 0.945 0.968 0.6 0.911 0.945 0.968 0.907 0.945 0.967 0.7 0.910 0.945 0.968 0.910 0.945 0.968 0.8 0.910 0.945 0.968 0.910 0.945 0.968 0.9 0.910 0.945 0.968 0.910 0.945 0.968 1(CF) 0.457 0.930 0.569 0.457 0.930 0.569 0 0.2 0.4 0.6 0.8 1 0 CECF-NC_Recall CF_Recall 0 0.2 0.4 0.6 0.8 1 CECF-NC_Precision CF_Precision 0 0.2 0.4 0.6 0.8 1 CECF-NC_F1 CF_F1 0.1 0.2 0.4 0.5 0.6 0.8 0.9 0.95 0 0.1 0.2 0.4 0.5 0.6 0.8 0.9 0.95 0 0.1 0.2 0.4 0.5 0.6 0.8 0.9 0.95 Fig. 4. Comparison of CECF-NC and CF. 10 H.-F. Wang, C.-T. Wu / Computers & Operations Research ] (]]]]) ]]]–]]] Please cite this article as: Wang H-F, Wu C-T. A strategy-oriented operation module for recommender systems in E-commerce. Computers and Operations Research (2010), doi:10.1016/j.cor.2010.03.011