pplications of recommender systems in target selection amoorthy Srikumar; Bharat Bha Journal of Targeting, Measurement and Analysis for Marketing: Oct 2004; 13, 1; ABI/INFORM Global Applications of recommender systems in target selection Received: 25th May, 2004 Krishnamoorthy Srikumar the Indian Institute of Management, Lucknow. He specialises in the area of information technology and systems. He received his BE (Production Engineering) from the University of Madras. His current research interests clude data mining, electronic commerce, knowledge management and supply chain management Bharat bhasker logy and systems ( Electronics& Comm. Engineering) from the University of Roorkee, India, and his MS and PhD in Computer Science from Virginia Tech, USA, His key research topics include distributed heterogeneous database management systems, query optimisation in distributed and parallel DBMSs, internet applications in business and electronic commerce, agent-based electronic commerce and data warehousing. Abstract A typical target selection problem aims at selecting prospects that are more likely to respond to a promotional campaign. There are varieties of target selection models available in the literature that address this problem. This paper investigates the use of recommender systems for selecting target customers in internet business. The suggested methodology uses the concepts of collaborative filtering and data mining for effectively selecting the target customers. The methodology is experimentally evaluate on a real-life data set and its benefits demonstrated. The experimental results reveal hat the suggested methodology provides better predictive capabilities compared to random target selection methods. The methodology could be useful for e-commerce managers in devising suitable promotional strategies whenever a new product is introduced into the online store INTRODUCTION targeted marketing is aimed at identifying Database marketing involves collecting a few groups of customers who are ore and electronically storing information kely to respond to the promotional about custom products Inpaign. Selee databases. The proliferation of database prospectivccustomers and offerin arketing activities has fuelled the targeted promotion helps in reducing tI owth of direct marketing, which is promotional cost as well as in deriving a typically targeted at a single business or realistic improvement in response rates individual consumer. This is in contrast Suppose a mass mailing has to be sent to mass marketing that is aimed at to 100,000 customers at a promotional th ds or even millions of prospective cost of rupees(Rs )20 per customer. If Krishnamoorthy Srikumar customers. The data that are collected for it is assumed that 3 per cent of tl Management, Lucknow database marketing initiatives are used to customers respond to the campaign, then 26013,Inda profile customers and develop effective the total profit would be Rs. 18 lakI le: 191 2223 4102. and efficient promotional strategies (one lakh= 100,000 Rs), taking profit at ail: srikumar@iiml, ac, in Unlike lass Imarketing, direct or Rs (00 per custoer. The total cost, 3) Henry Stewart Publications 0967-3237 (2004) Vol 13, 1, 61-69 Journal of Targeting, Measurement and Analysis for Marketing 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Applications of recommender systems in target selection Krishnamoorthy Srikumar; Bharat Bhasker Journal of Targeting, Measurement and Analysis for Marketing; Oct 2004; 13, 1; ABI/INFORM Global pg. 61
Srikumar and bhasker Table 1: Cost benefits of target marketing against mass marketing S No. Description Mass marketing Target marketing Total number of customers 100,000 2 Costs of promotion (at Rs 20 per customer) 20 lakhs 3 Cost of target selection 0. 5 lakhs 2.5 lakhs Number of customers responded er cen 700(7 per cent 6 Profits(at Rs 600 per customer 18 lakhs 4.2 lakhs Net gain/ ()2 lakhs 1.7 lakh however, of the promotional campaign is on their likelihood of responding to the Rs. 20 lakhs (promotional cost at Rs 20 promotional campaign. The customers per customer) leading to a net loss of with higher scores are then selected for Rs. 2 lakhs. On the other hand, if target targeted mailings selection is carried out and 10 per cent The techniques used for target of the customers are targeted, the total selection in the literature include cost of promotion would be Rs. 2.5 regression,decision trees, neural lakhs. Due to target selection, the networks and fuzzy logic. Bauer and of respondents is likely Kaymak explore the use of RFM increase.SO, assuming a nominal increase (recency, frequency and monetary) of 4 per cent (ie from 3 per cent to 7 variables for efficient target selection per cent), the total profits would be Rs. RFM variables the 4.2 lakhs. This leads to a net gain of Rs. purchase behaviour with a relatively 7 lakhs due to the targeted smaller number of features promotional campaign (refer to Table 1 for sample cost computations). As illustrated in Table 1, mass marketing Problem statement and motivation may yield losses while target marketing The traditional approach to the target (which is aimed at a few prospects) election problem makes use of a set of provides significant gains explanatory features, which is built on customers' past history(such as purchase dem Related work details)or trial campaign results, for The target selection methods in the building the prediction model. The iterature can be broadly classified into model is then used to identify the two likelihood of Inethods and scoring methods. In the the promotional campaign. In this paper, segmentation method, the customer a specific class of this target selection database is split into different groups problem is addressed, viz identifying ed on the similarities of h features. An estimate of the response product is introduced into a particular computed. The customers within the One approach to handling this problem group having the higher response is to explicitly ask the customer's interest percentage are then selected for targeted on a set of product categories and send mailings In scoring methods, a separate them targeted promotions as and when a score is assigned to each individual based new product is introduced in the 62 Journal of Targeting, Measurement and Analysis for Marketing Vol. 13, 1, 61-69 o Henry Stewart Publications 0967-3237(2004) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Applications of recommender systems in target selection categories of interest to customers. The ssumptions used in this paper. The likely problem in such a simple approach methodology for target sclcction is is that not all potential buyers may solicit deliberated on in the third section. Th promotional mails for new product scction following that is devoted to introductions in a category. Here,a expcrimental evaluation of the systen structured methodology is presented to and a discussion of its implications. The identify the prospects that are more paper concludes with a summary of the likely d to campaign, especially for new product introductions within a category of an online retail store PRELIMINARIES A typical recommender system, which This section describes the definitions, is aimed at generating recommendations notations al sumptions used in this at product category level, profiles paper. The notations used are made customers and identifies a set of likely distinct by making the in bold and products(categories of product) that are italics throughout this paper of interest to the. These systems also P A set of products (categories of generate Top-N products(categories of products) in the database is denoted as product)as recommendation in a ranked P=(P, P,.. Pn;, where n is the tot order. Apart from providing number of product categories in the Cs re recommender systems can also provide cn 1n sl ch a way that there are only rich insights in identifying prospects that Stock Keeping Units(SKUS) or brand are likely to purchase a new product names of products below this level. For e there is a Here. the use of such a novel products available in the databasc as methodology is investigated for this shown in Figure 1. In Figure l, at level specific class of target selection problem 1(root), there is the personal care and in c-cotmmeice. grooming category. At level 2, there is The primary contributions of this the dental care and hair care product paper are as follows: first, a novel egory. At level 3, each of the product methodology for target selection in categories in level 2 has sct of other internet business using recommender product categorics. Bclow this level systems is suggested. The methodology there are varicties of products uses basic concepts of collaborative (SKUs/brand names of products). So, the filtering and data miningfor total number of products in level 3( effectively selecting the target customers. this example) is taken as the total Secondly, the methodology is number of products in the database. experimentally evaluated on a real-life Tgtp The target product is tI data retailer in India. and its benefits customers need to be sclected and demonstrated. The suggested system can denoted as TgtP(TgtPE P erve as a useful tool for e-commerce CustomerDB. Customer database managers in devising effective denoted as Customer DB. consists of the promotional strategies The organisation of the rest of this products in P. More specifically, the is as follows: the following section database has data of the for describes the definitions, notations and for each customer Ck in C. The Pi,s il a Henry Stewart Publications 0967-3237(2004) Vol 13, 1, 61-69 Journal of Targeting, Measurement and Analysis for Marketing 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Srikumar and Bhasker Personal care and grooming(PCG) Dental care(DC) Hair care(HC) TB Toothpastes DCo Dental care others Toothbrushes Shampoos and conditioners Figure 1 Product taxonomy for personal care and grooming category p takes on the value of count of rule mining" are now discussed. The purchases. A better understanding of this parameters used in association rule notation can be derived with the help of mining viz minimum support and the sample customer database presented minimum confidence are denoted as nonsupport and minconf respectively. In The campaign size used for target this paper, a default minimum support denoted as s alue has been used as frequent 1-itenl the targo target product, Ttp. The customer database(CustomerDB)is split default minimum confidence value is hosen as 50 per cent, although the eferred to as Train DB and TestDB choice can be made flexibly respectivel The maximum number of rules that is SimU. Total number of similar users identified in the collaborative filtering Max Rules. In this paper, the default process is denoted as SimU value of maxRules are sct as 100.000 Collab UserDB. The similar This is done to reduce the performance collaborative users identified in the bottleneck of the The system collaborative filtering step is denoted as however, can be experimented with Collab UserDB and consists of a set of various other values collaborative users for the target he rules are scored in this approach custoner. That is, it is of the form where= 1 to SimU and CiCi. of the rule, that is score SimMetric. The similarity metrics used confidence. Lin'" has utilised this method by the system is denoted as SimMetric. of scoring in the literature. In this paper, cosine similarity metric S N. The total number of products that has been utilised for experimentation needs to be generated The notations spccific to association recoendation is denoted as N. In this 4 Jounal of Targeting, Measurement and Analysis for Marketing Vol. 13, 1. 61-69 0 Henry Stewart Publications 0967-3237(2004) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Applications of recommender systems in target selectin Table 2: A sample customer database CXP P P P7 CCCCCCG 25 253 81111 0413 203 2099 245175625 11 paper, the default value of N is taken as database, TestDB, the prediction for ten ie Top-10 products are generated as purchase of the target product, T'gtP hen computed. It is conmputed as the most common value of the collaborative users' target product, TgtP, value in the TARGET SELECTION training database, TiainDB The complete pseudo-code for target 3 Rule mining: For a customer Ci in lection is provided in Figure 2. The the collaborative user databasc, Methodology for target selection in this Collab_ UserDB, extract his/he1 approach is described in five ste collaborative users. Then, for the selected follows users, cull thcir purchase details froim the I Initial processing: From the given training databasc, Tiain DB. The resulting customer database separate out the details data are mined to generate association of the target product, TgtP ' The target rules with the following constraints:(1) product valucs are converted to binary The rule consequent has single itcl. values(ie O and 1s) based on whether which is TgtP, (2)maximun nuInber of e product has been purchased by the rules generated are less than or equal to customer or not. The customer database MaxRules. The generated rules are ther is then split into training and test cored and sorted on descending order of databases. Now, the objective in target their scores. The score for the rule selection is to identify the prospects in computed as the product of their support test database that are likely to respond to and confidence values the promotional campaign diction From the training databasc rules generated, extract Top-N products 2 Collaborative user identification: For based on their scores, The cumulative h user Ci, in TestDB, identify it 15 collaborative users. Collaborative user response prediction score for tha two steps viz (1)Compule runed in is computed for every customer l/a. arc identified using the collaborative customer. The response prediction scol g test database, TestDB, by executing steps similarities between Ci and every user 2 and 3 above (ie collaborati the training database, Train DB Id rule Similarities are computed using osine 5 larget selection: All customers in the this (2)Sele )B SimU users who have higher similarity non-increasing order of thcir response values. For every user in the to prediction scores. Now, using the 1) Henry Stewart Publications 0967-3237(2004) Vol 13, 1, 61-69 Journal of Targeting, Measurement and Analysis for Marketing 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Srikumar and bhasker Inputs: TgtP, Customer DB, SimU, minconf, minsupport, N, S L. Split the CustomerDB into Train DB and Testl 2. For every customer in TestDB a. Compute similarity values with every other user in Train DB b, Sort the customers on non-increasing order of similarity values c. Identify SimU users d. Store the similar users in Collab UserDB e. Store the prediction for the user as alue of Tgp in TrainDB for the users identified in step d above f. Extract the purchase details from Train DB for the customers identified g. Mine association rules with the following constraints Rule consequent is TgtP, Rule consequent has single item and Maximum number of rules s maxrules h. Extract the antecedents of the rules and score them using rule scores i. Sort the above products in descending order of their scores j. Select Top-N products and compute the cumulative scores k. Response prediction for the customer= cumulative score in step j 3. Sort the customers in TestDB on decreasing order of their response prediction 4. Select the target customers based on campaign size, S 5. Return Figure 2 Pseudo-code for target selection mpaign size (S)set by the marketer, train and test database in the ratio of targets can be selected (say, Top-10 per 50: 50 and the methodology evaluated or cent or Top-20 per cent of the wo commonly used metrics viz hit customers are sclected as targets probability chart and gain charts. Hit probability charts show the percentage of targeted customers who will respond EXPERIMENTAL RESULT positively to the campaign given the The complete methodology for target percentage of customers targeted. Gain selection discussed in the section above is charts show the gains to be expected built using C++ on a Pentium-llI PC when the target selection model is running Red Hat Linux 7.2 applied, over the gains usually obtained when the targets are selected at randor Experimental design and metrics For the experimentation real-life data were gathered fro one of the leading Figures 3 and 4 show the hit probability online retailers in India. The collected and gain charts for the experimentation data have customer purchase details of carried out on the real life data set. A 359 customers on 105 product categories cursory look at Figure 3 reveals that, as with average transaction size of 7.51 the campaign size is increased, the The ratio of non-zero entries to the total percentage of response falls and drops to number of entries in the customer a level when all the targets are selected product matrix is 7. 13 per cent (ie at random. For the real-life experiments density of the data set is 7. 13 per cent). performed, a response rate of around 28 The customer database was split into per cent was derived cven when the 66 Journal of Targeting, Measurement and Analysis for Marketing Vol. 13, 1. 61-69 Henry Stewart Publications 0967-323 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Applications of recommender systems in target selection 100 0.090.190.280.380.470.570.660.760.850.951.00 Campaign Size Figure 3 Hit probability chart targets are selected at random. In many a marketer interested in naximising th real-life cascs, however, the response profits can use such an analysis to devise rates are as low as 3-5 per cent. Very ffective promotional programe high response rates can be explained by the fact that the size of the data set considered is smaller Disc The gain chart in Figure 4 shows the From the foregoing experimental analys that can be expected by applying rly evident th he model. as can be cl shows better results compared to random total gain increases untill the campaign target selection methods. So, the size is increased to 38 per cent. Beyond methodology discussed in this paper car select appropriate campaign size to this point, the gain starts decreasing. So, be useful for e-commerce managers the marketer can choose the right campaign size to maximise the gains that maximise the total profits that can be chieved th tional To compute the net profits that can be There are a number of is derived from a promotional campaign, a need to be addressed in order to make promotional campaign cost per customer the suggested methodology more robust d as rs Per First, the count of customer purchase tot election cost as Rs. 2.000 data was used for generating and profit per custoner as Rs. 200. The recommendations and he net profits that can be derived at varying prospects. Alternatively, the use of RRFM campaign size are plotted in Figure 5. As variables" in the approach can be the profits inc explored for better selection of prospects 39-40 per cent campaign size, beyond Secondly, the size of the data sct which the net profits start to decline. So, considered for the experiment o) Henry Stewart Publications 0967-3237(2004) Vol. 13, 1. 61-69 Journal of Targeting, Measurement and Analysis for Marketing Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Srikumar and bhasker 120 100 0.090.190.280.380.470.570.660.760.850.951.00 Campaign Size random model-▲gain 40 3500 20 1500 0.100.200.290.390.490.590.680.780.880.981.00 Figure 5 Profit on varying campaign size evaluation of the system is quite small. It prospects. The customer click and y be useful to study how the purchase histories can be combined using methodology performs for a large-scale the methodology described by Cho et sample. Thirdly, the use of customer demographic and psychographic details an aid in customer profiling during Fourth CONCLUSIONS provide a rich source of data for identifying customer target selection using the ce purchase behaviours, Customer click recommender systems was introduced histories can be used in conjunction with The methodology was implemented and purchase histories for better sclcction of experimental results provided on a 68 Journal of Targeting, Measurement and Analysis for Marketing Vol. 13, 1, 61-69 2 Henry Stewart Publications 0967-3237(2004) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Applications of recommender systems in target selection real-life data sct. The experimental results 7 Sousil. J M, Kay nak, U and reveal that the approach provides bette coMparative stuty of fuz in direct marketing Linus of che 1lth predictive capabilities compared to IEEE Interational, Conference o Fuzzy SysteMs random target selection methods. In future, the proposed target selection methodology can be compared against other model-based methods to study its 9 Kayak(2(N)1)vga rit performance. The system could be useful for e-conimerce managers to identify Pramataris. K. . ud (0(1)'1)yl:lic recomMit target customers effectively and to devise tailing, in ' Proceedings ot European Conle're suitable promotional strategies 12 Srikumar, K. anld Bhasker, 13.(2(14)'I'ersonl: References f the 5th Worldd Congress olt M:: gement of Czinkot. M. R. Dickson P R :ld Dunne P Publications 3 Prassas e/ al.(20)1)op air 2 Kay: k, U. (((1)Fuzzy target selection using 4 Han, J and Kamber, M.(2(1>)'1) aT: Inlining: RIM variables, in 'Proceedings of oth IFSA World 4138-10143 15 Srikumar and IShasker (2(3()-)opx, air 3 Bult. J and Wansbeck, 'I:(1.)Optinal selection 16, Agrawal,R. and Srik anlt, R.(12%)-4)'last algorithms for direct mail, Marketing Sience, Vol. 1-4. pp sociation rules,in 3783)4 1. 1)15 Conference, Sa Ititiago. Chile, Pl 4 I laughton. 1) and Oulabi, S.(193)Direct 487-49 marketing Il(celing with C.ARI and CHAIl) 17 Lin, W. Alvarez, s. A. and Ruiz, (.(20051) Joarrmad ef Direct Marketing, VuL. 7, Pp. IG 2 fficient adaptive support association rule ining 5 Zahavi, J and Levin, N.(1997)Applying neural for recommendation systens. ID) ta Minine ane/ Marketing, Vol Pp.2-22 s Sousa ef al.(2002)ep. 6 Stones, M. a nd Kaynak, U.(2(0)1)'I'uzzy nodc lily 13.ter(I large d tasts: A ( plication co target selec tion in direct llarkctiy 1: /: 1: IiaMsactious oit i-uz-y Systems, Vol. 9, No. I ad de pp.153-163 stems with Applications, VoL. 23, pp.32%)342. D Henry Stewart Publications 0967-3237(2004) Vol. 13, 1, 61-69 Journal of Targeting, Measurement and Analysis for Marketing 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission