Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising Ling Yan YLINGO718@SJTU.EDU.CN Shanghai Key Laboratory of Scalable Computing and Systems,Department of Computer Science and Engineering,Shang- hai Jiao Tong University,China Wu-Jun Li LIWUJUN@NJU.EDU.CN National Key Laboratory for Novel Software Technology,Department of Computer Science and Technology,Nanjing University,China Gui-Rong Xue GRXUE@ALIBABA-INC.COM Alibaba Group,China Dingyi Han DINGYI.HAN@ALIBABA-INC.COM Alibaba Group,China Abstract 1.Introduction Recently,online advertising has become the most popular In display advertising,click through rate(CTR) and effective approach to do brand promotion and produc- prediction is the problem of estimating the prob- t marketing.It is a multi-billion business on the web and ability that an advertisement(ad)is clicked when accounts for the majority of the income for the major inter- displayed to a user in a specific context.Due net companies,such as Google,Yahoo and Alibaba.Dis- to its easy implementation and promising perfor- play advertising is a big part of online advertising where mance,logistic regression (LR)model has been advertisers pay publishers for placing graphical advertise- widely used for CTR prediction,especially in in- ments (ads)on publishers'web pages(Chapelle et al., dustrial systems.However,it is not easy for LR 2013).The publishers allocate some positions on their web to capture the nonlinear information,such as the pages and sell them to different advertisers.Users visit the conjunction information,from user features and web pages and can view the published ads.There are some ad features.In this paper,we propose a nov- other roles,such as ad agencies and publisher networks,to el model,called coupled group lasso (CGL),for compose the complex advertising system(Muthukrishnan, CTR prediction in display advertising.CGL can 2009).But that is not the focus of this paper.So we will just seamlessly integrate the conjunction information focus on the scenarios with a user-advertiser-publisher tri- from user features and ad features for modeling. partite business,in which three parties have separate goals Furthermore,CGL can automatically eliminate that can be reduced to a unified task in the end.The ad- useless features for both users and ads,which vertisers pay more attention on the desired user actions, may facilitate fast online prediction.Scalabili- such as clicks on the ads,subscriptions to the mailing list. ty of CGL is ensured through feature hashing and or purchases of products.Different advertisers target dif- distributed implementation.Experimental results ferent kinds of users.For example,a basketball company on real-world data sets show that our CGL model will be interested in users who bought many sports equip- can achieve state-of-the-art performance on web- ments recently,and a hotel would prefer to display its ads scale CTR prediction tasks. to people who travel frequently.There are different pay- ment options for advertisers,such as cost-per-click(CPC), cost-per-mill(CPM),and cost-per-conversion(CPA)(Mah- dian Tomak,2007).For the publisher part,their goal is to Proceedings of the 31st International Conference on Machine maximize the revenue from the advertisers and attract more Learning.Beijing,China,2014.JMLR:W&CP volume 32.Copy- users to their web pages.So they had better precisely dis- right 2014 by the author(s). play suitable ads to a specific user,and avoid affecting userCoupled Group Lasso for Web-Scale CTR Prediction in Display Advertising Ling Yan YLING0718@SJTU.EDU.CN Shanghai Key Laboratory of Scalable Computing and Systems, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China Wu-Jun Li LIWUJUN@NJU.EDU.CN National Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, China Gui-Rong Xue GRXUE@ALIBABA-INC.COM Alibaba Group, China Dingyi Han DINGYI.HAN@ALIBABA-INC.COM Alibaba Group, China Abstract In display advertising, click through rate (CTR) prediction is the problem of estimating the probability that an advertisement (ad) is clicked when displayed to a user in a specific context. Due to its easy implementation and promising performance, logistic regression (LR) model has been widely used for CTR prediction, especially in industrial systems. However, it is not easy for LR to capture the nonlinear information, such as the conjunction information, from user features and ad features. In this paper, we propose a novel model, called coupled group lasso (CGL), for CTR prediction in display advertising. CGL can seamlessly integrate the conjunction information from user features and ad features for modeling. Furthermore, CGL can automatically eliminate useless features for both users and ads, which may facilitate fast online prediction. Scalability of CGL is ensured through feature hashing and distributed implementation. Experimental results on real-world data sets show that our CGL model can achieve state-of-the-art performance on webscale CTR prediction tasks. Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s). 1. Introduction Recently, online advertising has become the most popular and effective approach to do brand promotion and product marketing. It is a multi-billion business on the web and accounts for the majority of the income for the major internet companies, such as Google, Yahoo and Alibaba. Display advertising is a big part of online advertising where advertisers pay publishers for placing graphical advertisements (ads) on publishers’ web pages (Chapelle et al., 2013). The publishers allocate some positions on their web pages and sell them to different advertisers. Users visit the web pages and can view the published ads. There are some other roles, such as ad agencies and publisher networks, to compose the complex advertising system (Muthukrishnan, 2009). But that is not the focus of this paper. So we will just focus on the scenarios with a user-advertiser-publisher tripartite business, in which three parties have separate goals that can be reduced to a unified task in the end. The advertisers pay more attention on the desired user actions, such as clicks on the ads, subscriptions to the mailing list, or purchases of products. Different advertisers target different kinds of users. For example, a basketball company will be interested in users who bought many sports equipments recently, and a hotel would prefer to display its ads to people who travel frequently. There are different payment options for advertisers, such as cost-per-click (CPC), cost-per-mill (CPM), and cost-per-conversion (CPA) (Mahdian & Tomak, 2007). For the publisher part, their goal is to maximize the revenue from the advertisers and attract more users to their web pages. So they had better precisely display suitable ads to a specific user, and avoid affecting user