正在加载图片...
T.-K Far, C-H Chang/ Expert Systems with Applications 38(2011)1777-I Accuracy of page-ad matching Precision G 0.8 ntention datase All triggering pages 0.5 okapi BM25 dataset All triggering pages 0.1 F*IDF Negative datase 82 Language Model Okapi BM25 TEMIDE Intention dataset All triggering pages Fig. 7. The performance of the three IR models in any IR models for placing blogger-centric ads In addition to ccuracy, we also used the precision-recall curve to show our eval- The results are displayed in Fig. 7. It is clear that the Okapi BM uation, as shown in Fig. 6. Each data point corresponds to the pre 25 and td idf model can generate MAP of around 41% and 32% cision value calculated at a certain percentage of recall. The results respectively; besides, improved performance(of around 43%)can clearly indicated that using the language model can achieve better be produced by language model. As shown in these figures, the re- performance than the use of okapi BM 25 and tf* idf. Besides, we sults based on language model are apparently very powerful and adopted another presentation involving two quality measures consistently superior than using Okapi BM25 and tf* idf. Precision@K and mean average precision(MAP)) to assess match average retrieval precision computed at recall level K as follows: Several prior research studies are relevant to our work, includ- g efforts in online advertising and sentiment classification. Precison(k Many personalized advertising methods are proposed that make use of explicit user profiles, which are gathered, maintained, and analyzed by the ad placing system. Such methods often make is the number of queries used, and P is the precision at recall le- 2002). Many web portals create user profiles using information gained during the registration process. However, due to the consid- To compare the precision-recall curves across the three page-ad eration for privacy, users tend to give incorrect data. In addition to tching functions, we computed MAP. For a single query. Average Precision is the average of the precision value derived stored in the web server logs(Bae, Park, Ha, 2003). Several stud- for the set of k top documents that exist after each relevant doc ument is retrieved. This value is then average over all queries. of relevant associations for consumers(Langheinrich, Nakamura dn,d,dm) and Ri is the set of ranked retrieval results from ads can turn off users and relevant ads are more likely to be clicked the top result until the retrieval system returns the documents ( Chatterjee et al, 2003; Parsons, Gallagher, Foster, 2000). They show that advertisements that are presented to users who are MAP@)-Q2mEP not interested can result in customer annoyance. Thus, in order Precision(Rik to be effective. the authors conclude that advertisements should be relevant to a consumers interests at the time of exposure. No- vak and Hoffman(1997) reinforce this conclusion by pointing out whe ad matching models, we computed the MAP score for three Q(q1, q2..,9m) is a set of queries. Since we have three that the more targeted the advertising, the more effective it is. as a result. certain studies have tried to determine how to take advan- tage of the available evidence to enhance the relevance of selected ads. For example, studies on keyword matching show that the nat ure and number of keywords affect the likelihood of an ad being clicked(OneUpWeb, 2005). As for contextual advertising, Ribeiro- Okapi Neto et al. (2005) proposed a number of strategies for matching TF*IDF pages to ads based on extracted keywords. The first five strategies proposed in this work match pages and ads based on the cosine of x the angle between their respective vectors. To identify the impor nt parts of the sections (e.g, bid phrase, title, and body )as a basis for the ad vec- 0.1 tor. The winning strategy required the bid phrase to appear on the page, and then ranked all such ads using the cosine of the union of 00.10.2030405060.70.80.91 all the ad sections and the page vectors. while both pages and ads Recall are mapped to the same space, there exists a discrepancy(called 'impedance mismatch")between the vocabulary used in the ads Fig. 6. Precision-recall curve. and on the pages. Hence, the authors achieved improved matchingin any IR models for placing blogger-centric ads. In addition to accuracy, we also used the precision-recall curve to show our eval￾uation, as shown in Fig. 6. Each data point corresponds to the pre￾cision value calculated at a certain percentage of recall. The results clearly indicated that using the language model can achieve better performance than the use of Okapi BM 25 and tf  idf. Besides, we adopted another presentation involving two quality measures (Precision@K and mean average precision (MAP)) to assess match￾ing results:  We calculated the average retrieval precision computed at recall level K as follows: Precison@ðKÞ ¼ PNq i¼1Pi@ðKÞ Nq where Precision@(K) is the average precision at recall level K, Nq is the number of queries used, and Pi is the precision at recall le￾vel K for the ith query.  To compare the precision-recall curves across the three page-ad matching functions, we computed MAP. For a single query, Average Precision is the average of the precision value derived for the set of k top documents that exist after each relevant doc￾ument is retrieved. This value is then average over all queries. That is, if the set of relevant documents for a query q 2 Q is {d1,d2,...,dmj} and Rjk is the set of ranked retrieval results from the top result until the retrieval system returns the documents dk, then MAPðQÞ ¼ 1 jQj X jQj j¼1 1 mj Xmj k¼1 PrecisionðRjkÞ where, Q {q1, q2,..., qm} is a set of queries. Since we have three page-ad matching models, we computed the MAP score for three query sets. The results are displayed in Fig. 7. It is clear that the Okapi BM 25 and td  idf model can generate MAP of around 41% and 32%, respectively; besides, improved performance (of around 43%) can be produced by language model. As shown in these figures, the re￾sults based on language model are apparently very powerful and consistently superior than using Okapi BM25 and tf  idf. 5. Related work Several prior research studies are relevant to our work, includ￾ing efforts in online advertising and sentiment classification. Many personalized advertising methods are proposed that make use of explicit user profiles, which are gathered, maintained, and analyzed by the ad placing system. Such methods often make use of data-mining techniques (Lai & Yang, 2000; Perner & Fiss, 2002). Many web portals create user profiles using information gained during the registration process. However, due to the consid￾eration for privacy, users tend to give incorrect data. In addition to user profiles, an alternative solution is to exploit information stored in the web server logs (Bae, Park, & Ha, 2003). Several stud￾ies pertaining to advertising research have stressed the importance of relevant associations for consumers (Langheinrich, Nakamura, Abe, Kamba, & Koseki, 1999; Wang et al., 2002) and how irrelevant ads can turn off users and relevant ads are more likely to be clicked (Chatterjee et al., 2003; Parsons, Gallagher, & Foster, 2000). They show that advertisements that are presented to users who are not interested can result in customer annoyance. Thus, in order to be effective, the authors conclude that advertisements should be relevant to a consumer’s interests at the time of exposure. No￾vak and Hoffman (1997) reinforce this conclusion by pointing out that the more targeted the advertising, the more effective it is. As a result, certain studies have tried to determine how to take advan￾tage of the available evidence to enhance the relevance of selected ads. For example, studies on keyword matching show that the nat￾ure and number of keywords affect the likelihood of an ad being clicked (OneUpWeb, 2005). As for contextual advertising, Ribeiro￾Neto et al. (2005) proposed a number of strategies for matching pages to ads based on extracted keywords. The first five strategies proposed in this work match pages and ads based on the cosine of the angle between their respective vectors. To identify the impor￾tant parts of the ad, the authors explored the use of different ad sections (e.g., bid phrase, title, and body) as a basis for the ad vec￾tor. The winning strategy required the bid phrase to appear on the page, and then ranked all such ads using the cosine of the union of all the ad sections and the page vectors. While both pages and ads are mapped to the same space, there exists a discrepancy (called ‘‘impedance mismatch”) between the vocabulary used in the ads and on the pages. Hence, the authors achieved improved matching Table 8 Accuracy of page-ad matching. IR method Dataset Accuracy (%) Language model Positive dataset 57 Negative dataset 79 Intention dataset 57 All triggering pages 64 Okapi BM25 Positive dataset 52 Negative dataset 80 Intention dataset 50 All triggering pages 62 TF  IDF Positive dataset 50 Negative dataset 82 Intention dataset 52 All triggering pages 60 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Precision LM Okapi TF*IDF Fig. 6. Precision-recall curve. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Language Model Okapi BM25 TF*IDF IR models Score. MAP Precision@1 Precision@2 Precision@3 Fig. 7. The performance of the three IR models. 1786 T.-K. Fan, C.-H. Chang / Expert Systems with Applications 38 (2011) 1777–1788
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有