Coupled Group Lasso for Web-Scale CTR_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising

正在加载图片...

Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising experience of the web pages.From the user part,they want group lasso(CGL)for CTR prediction in display advertis- to find useful information from the web pages and find ads ing.The main contributions are outlined as follows: that they are really interested in. To satisfy the desire of all three parties,an accurate target- CGL can seamlessly integrate the conjunction infor- ing of advertising system is of great importance,in which mation from user features and ad features for model- the click through rate(CTR)prediction of a user to a spe- ing,which makes it better capture the underlying con- cific ad plays the key role (Chapelle et al.,2013).CTR nection between users and ads than LR. prediction is the problem of estimating the probability that CGL can automatically eliminate useless features for the display of an ad to a specific user will lead to a click. both users and ads,which may facilitate fast online This challenging problem is at the heart of display adver- tising and has to deal with several hard issues,such as very prediction. large scale data sets,frequently updated users and ads,and CGL is scalable by exploiting feature hashing and dis- the inherent obscure connection between user profiles(fea- tributed implementation tures)and ad features Recently,many models have been proposed for CTR pre- 2.Background diction in display advertising.Some models train standard classifiers,such as logistic regression (LR)(Neter et al. In this section,we introduce the background of our model, 1996)or generalized linear models,on simple concatena- including the description of CTR prediction task,LR mod- tion of user and ad features (Richardson et al..2007;Grae- el,and group lasso (Yuan Lin,2006;Meier et al.,2008). pel et al.,2010).Some other models use prior knowledge like the inherent hierarchical information for statistical s- 2.1.Notation and Task moothing in log-linear models (Agarwal et al..2010)or We use boldface lowercase letters,such as v,to denote col- LR models (Kuang-chih et al.,2012).In (Menon et al., umn vectors and v;to denote the ith element of v.Boldface 2011),a matrix factorization method is proposed,but it uppercase letters,such as M,are used to denote matrices, does not make use of user features.In (Stern et al..2009). with the ith row and the jth column of M denoted by Mis a probabilistic model is proposed to use user and item meta and M.j,respectively.Mij is the element at the ith row data together with collaborative filtering information,in and jth column of M.MT is the transpose of M and vT which user and item feature vectors are mapped into lower- dimensional space and inner product is used to measure is the transpose of v. similarity.However,it does not have the effect of auto- While some display advertising systems have access to on- matic feature selection from user and item meta features. ly some id information for users or ads,in this paper we In addition,inference of the model is too complicated to be focus on the scenarios where we can collect both user and used in a large scale scenario.In (Chapelle et al.,2013), ad features.Actually,the publishers can often collect us- a highly scalable framework based on LR is proposed,and er actions on the web pages,such as click on an ad,buy terabytes of data from real applications are used for eval- a product or type in some query keywords.They can an- uation.Due to its easy implementation and state-of-the- alyze these history behaviors and then construct user pro- art performance,LR model has become the most popular files (features).On the other hand,when advertisers submit one for CTR prediction,especially in industrial system- some ads to the publishers,they often choose some descrip- s(Chapelle et al.,2013).However,LR is a linear model, tion words,the groups of people to display the ads,or some in which the features contribute to the final prediction in- other useful features. dependently.Hence,LR can not capture the nonlinear in- We refer to a display of an ad to a particular user in a par- formation,such as the conjunction (cartesian product)in- ticular page view as an ad impression.Each impression is formation,between user features and ad features.In real a case that a user meets an ad in a specific context,such as applications,the conjunction information is very important daytime,weekdays,and publishing position.Hence,each for CTR prediction.For example,people who have high impression contains information of three aspects:the user, buying power may have more interest in luxury produc- the ad,and the context.We use xu of length I to denote t than those with low buying power,and college students the feature vector of user u,xa of length s to denote the may be more likely to buy machine learning books than feature vector of ad a.The context information together high-school students.Better performance can be expected with some advertiser id or ad id information are composed by exploiting the user-ad two-parts hybrid features through into a feature vector xo of length d.x is used to denote the feature conjunction. feature vector of an expression,with xT-(x). In this paper,we propose a novel model,called coupled Hence,if we use z to denote the length of vector x,we have z=l+s d.The result of an impression is click orCoupled Group Lasso for Web-Scale CTR Prediction in Display Advertising experience of the web pages. From the user part, they want to find useful information from the web pages and find ads that they are really interested in. To satisfy the desire of all three parties, an accurate targeting of advertising system is of great importance, in which the click through rate (CTR) prediction of a user to a specific ad plays the key role (Chapelle et al., 2013). CTR prediction is the problem of estimating the probability that the display of an ad to a specific user will lead to a click. This challenging problem is at the heart of display advertising and has to deal with several hard issues, such as very large scale data sets, frequently updated users and ads, and the inherent obscure connection between user profiles (features) and ad features. Recently, many models have been proposed for CTR prediction in display advertising. Some models train standard classifiers, such as logistic regression (LR) (Neter et al., 1996) or generalized linear models, on simple concatenation of user and ad features (Richardson et al., 2007; Graepel et al., 2010). Some other models use prior knowledge like the inherent hierarchical information for statistical smoothing in log-linear models (Agarwal et al., 2010) or LR models (Kuang-chih et al., 2012). In (Menon et al., 2011), a matrix factorization method is proposed, but it does not make use of user features. In (Stern et al., 2009), a probabilistic model is proposed to use user and item meta data together with collaborative filtering information, in which user and item feature vectors are mapped into lowerdimensional space and inner product is used to measure similarity. However, it does not have the effect of automatic feature selection from user and item meta features. In addition, inference of the model is too complicated to be used in a large scale scenario. In (Chapelle et al., 2013), a highly scalable framework based on LR is proposed, and terabytes of data from real applications are used for evaluation. Due to its easy implementation and state-of-theart performance, LR model has become the most popular one for CTR prediction, especially in industrial systems (Chapelle et al., 2013). However, LR is a linear model, in which the features contribute to the final prediction independently. Hence, LR can not capture the nonlinear information, such as the conjunction (cartesian product) information, between user features and ad features. In real applications, the conjunction information is very important for CTR prediction. For example, people who have high buying power may have more interest in luxury product than those with low buying power, and college students may be more likely to buy machine learning books than high-school students. Better performance can be expected by exploiting the user-ad two-parts hybrid features through feature conjunction. In this paper, we propose a novel model, called coupled group lasso (CGL) for CTR prediction in display advertising. The main contributions are outlined as follows: • CGL can seamlessly integrate the conjunction information from user features and ad features for modeling, which makes it better capture the underlying connection between users and ads than LR. • CGL can automatically eliminate useless features for both users and ads, which may facilitate fast online prediction. • CGL is scalable by exploiting feature hashing and distributed implementation. 2. Background In this section, we introduce the background of our model, including the description of CTR prediction task, LR model, and group lasso (Yuan & Lin, 2006; Meier et al., 2008). 2.1. Notation and Task We use boldface lowercase letters, such as v, to denote column vectors and vi to denote the ith element of v. Boldface uppercase letters, such as M, are used to denote matrices, with the ith row and the jth column of M denoted by Mi∗ and M∗j , respectively. Mij is the element at the ith row and jth column of M. MT is the transpose of M and v T is the transpose of v. While some display advertising systems have access to only some id information for users or ads, in this paper we focus on the scenarios where we can collect both user and ad features. Actually, the publishers can often collect user actions on the web pages, such as click on an ad, buy a product or type in some query keywords. They can analyze these history behaviors and then construct user pro- files (features). On the other hand, when advertisers submit some ads to the publishers, they often choose some description words, the groups of people to display the ads, or some other useful features. We refer to a display of an ad to a particular user in a particular page view as an ad impression. Each impression is a case that a user meets an ad in a specific context, such as daytime, weekdays, and publishing position. Hence, each impression contains information of three aspects: the user, the ad, and the context. We use xu of length l to denote the feature vector of user u, xa of length s to denote the feature vector of ad a. The context information together with some advertiser id or ad id information are composed into a feature vector xo of length d. x is used to denote the feature vector of an expression, with x T = (x T u , x T a , x T o ). Hence, if we use z to denote the length of vector x, we have z = l + s + d. The result of an impression is click or

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising