People and data O=∽uH<HOo∽dozug Understanding Customer Behavior Andreas S. Weigend, Ph D former chief scientist amazon. com weigend. com MP004 Outline提纲 laking decisions based on experiments(A-B tests) 以A-B实验为基础进行决策 b Three ingredients for innovation 革新的三大因素 b Revealed vs stated preferences 被揭示vs被说出的偏 反复的建模过程 御 Define定义→ Measure度量→ Describe描述→ Predict预测→Act行动 e Some insights into online customer behavior 一些对在线客户行为的理解 Levels of analysis and actionability o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 1 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 1 of 14 W E I G E N D A S S O C I A T E S LLC People and Data: Understanding Customer Behavior Andreas S. Weigend, Ph.D. former chief scientist, amazon.com weigend.com 3 © 2004 by Weigend Associates LLC | www.weigend.com Outline 提纲 • Making decisions based on experiments (A-B tests) 以A-B实验为基础进行决策 Three ingredients for innovation 革新的三大因素 Revealed vs stated preferences 被揭示 vs 被说出的偏好 • The iterative process of modeling 反复的建模过程 Define 定义 Æ Measure 度量 Æ Describe 描述 Æ Predict 预测 Æ Act 行动 • Some insights into online customer behavior 一些对在线客户行为的理解 Levels of analysis and actionability Personalization vs Occasionalization Behavioral economics
amazon. com A动4$: to your Customers who bought Kind of Bke also bought: Proceed to Checkout re Brubeck, et Jonn used R new fro $5.79 our Price: 513.49 Used f r shaw gitt optio 0 Add to cart 0 Add to cart shopping cort Customers who bought items in your Shopping Cart also bought ( Edit shopping cart) Our Price: $11.982.00 Used R new frwn 56.90 O Proceed to Checkout Sigon to turn on I-Cick Customers who shopped for kind of Ble also shopped for Search Populor Music厂 All sMes:.@ d$16.01 to your order to qualfy PING CART Customers who bought Kind of Ble also bought e me qt w Dave Brubeck, et 0 a Love Supreme Jahn Davis our price: 59.99 厂 show gift options Used B new from $5.79 Our Price: s13.40 Used new from车9.50 uring checkout O Add to cat shopping cart Explore similar items Customers who bought items in your Shopping Cart also bought: e saxophone colossus Edit shopping carDD O Proceed to Chedout) Used R new fhrm $2.00 Sign in to tum on I-Clck 0 Add to cat 0 Add to cart Customers who shopped for Kind of Blue also shopped for o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 2 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 2 of 14 4 © 2004 by Weigend Associates LLC | www.weigend.com 5 © 2004 by Weigend Associates LLC | www.weigend.com
Result: Right vs Left结果:左右对比的结果 变化 下订单者增加的比例 b Order size: Additional items( from the second page)put in cart 订单规模:(第2页开始)更多附加物品被放入购物车 b Relative increase: Blue band on right compared with blue band on left 相对增加:左右两边的蓝条相比 Existin C贴:+0.8 eople Data 人 数据 Research and science 研究与学科 Human decision making Statistic 人们做出决策 统计学 achine learning 金融 机器学习 E-Business 电子商务 行为经济学 ating 约会 计算营销 o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 3 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 3 of 14 6 © 2004 by Weigend Associates LLC | www.weigend.com Result: Right vs Left 结果:左右对比的结果 • Metrics 度量准则 Conversion: Percentage of visits placing an order 变化: 下订单者增加的比例 Order size: Additional items (from the second page) put in cart 订单规模: (第2页开始)更多附加物品被放入购物车 • Some details 细节 Relative increase: Blue band on right compared with blue band on left 相对增加:左右两边的蓝条相比 +1.1% +1.4% +0.6% DVD ($): DVD Cart-adds: DVD 购物车: Cart-adds from 2nd page: 购物车从第二页增加: Existing customers 现有消费者 DVD ($): +1.0% Wishlist-adds: 购物预期增加: +0.8% Cart-adds from 2 +0.8% nd page: 在第二页购物车物品增加: All customers 所有的消费者 People 人 Data 数据 Human decision making: 人们做出决策 • Finance 金融 • E-Business 电子商务 • Dating 约会 Research and science: 研究与学科 • Statistics 统计学 • Machine learning 机器学习 • Behavioral economics 行为经济学 • Computational marketing 计算营销
Why now?为什么是现在? Data Implicit 成本 collected 隐式的 每日数据 (Clicks etc (点击等 Explicit Communication 显式的 通信 (Surveys etc. Storag (调研等) 1990 Time 20101990 Time 2010 时间 时间 e Data collected implicitly: Dramatic growth over time 隐式地采集数据:时刻保持急剧增长 e Data collected explicitly tacitly: Constant over time 显式地采集数据/无声的:时刻持续进行 atesLlcIwww.wer Three Ingredients For Innovation革新的三大因素 Data Methodology 数据 方法论 cence 计算机科学 统计学 Data base Research Data Analysis 数据库研究 探索数据分析 Machine Learning Domain expertise 知识领城 行为科学 Finance o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 4 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 4 of 14 8 © 2004 by Weigend Associates LLC | www.weigend.com Why Now? 为什么是现在? • Data collected implicitly: Dramatic growth over time 隐式地采集数据:时刻保持急剧增长 • Data collected explicitly / tacitly: Constant over time 显式地采集数据/无声的:时刻持续进行 Cost 成本 Storage 储存 Time 时间 1990 2010 Communication 通信 Explicit 显式的 (Surveys etc.) (调研等) Implicit 隐式的 (Clicks etc.) (点击等) Data collected per day 每日数据 采集 Time 时间 1990 2010 9 © 2004 by Weigend Associates LLC | www.weigend.com Three Ingredients For Innovation 革新的三大因素 Data 数据 Computer Science 计算机科学 Database Research 数据库研究 Methodology 方法论 Statistics 统计学 Data Analysis 探索数据分析 Machine Learning 机器学习 Domain expertise 知识领域 Behavioral Sciences 行为科学 Marketing 市场营销学 Finance 金融学
Research Questions研究问题 Characterize paths through website 分析通过网站的路径的特征 Modeling Online Browsing and Path Analysis Using Clickstream Data by Alan L Montgomery, Shibo Li, Kannan Srinivasan, and John C. Liechty e Understand and influence conversion 理解并影响转变 A Model of Web Site Browsing Behavior Estimated on Clickstream Data by Randolph E. Bucklin and Catarina Sismeiro, J of Marketing Research 40(2003) ."Dynamic Conversion Behavior at E-Commerce Sites by Wendy W. Moe and Peter S Fader. Management Science(2004) e Predict intention and modality of the visit 预测访问意图、目标和状态 s Seize the Occasion by Horacio D. Rozanski, Gerry Boilman, and Martin Lipman. e Compute and apply customer network value 测定并应用客户网络价值 by Pedro Domingos and Matt Richardson, KDD-2001. ACM Press. atesLlcIwww.wer xample: Invest $10M to Improve Customer Satisfaction 案例:投资1000万美元提高客户满意度 Base decision on analysis of behavioral data 以行为数据分析为 基础的决策 " So, as you can see, customer satisfaction is up considerably since phasing out the complaint forms. Consider思索 所以,如大家所见,自从我们停用投诉表以后,用户满意度得到了显著的上开 election?增加选择? Increase availability?提高可用性? Reduce clutter on web site?减少网站复杂性? Improve product search algorithms?改进产品搜索算法? o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 5 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 5 of 14 10 © 2004 by Weigend Associates LLC | www.weigend.com Research Questions 研究问题 • Characterize paths through website 分析通过网站的路径的特征 “Modeling Online Browsing and Path Analysis Using Clickstream Data” by Alan L. Montgomery, Shibo Li, Kannan Srinivasan, and John C. Liechty. Marketing Science (2004). • Understand and influence conversion 理解并影响转变 “A Model of Web Site Browsing Behavior Estimated on Clickstream Data” by Randolph E. Bucklin and Catarina Sismeiro, J of Marketing Research 40 (2003). “Dynamic Conversion Behavior at E-Commerce Sites” by Wendy W. Moe and Peter S. Fader. Management Science (2004). • Predict intention and modality of the visit 预测访问意图、目标和状态 “Seize the Occasion” by Horacio D. Rozanski, Gerry Bollman, and Martin Lipman. Strategy and Business (2001). • Compute and apply customer network value 测定并应用客户网络价值 “Mining the Network Value of Customers” by Pedro Domingos and Matt Richardson, KDD-2001. ACM Press. Example: Invest $10M to Improve Customer Satisfaction 案例:投资1000万美元提高客户满意度 • Base decision on analysis of behavioral data 以行为数据分析为 基础的决策 Quantify 定量 Model 模型 Act 行为 • Consider 思索 Increase selection? 增加选择? Increase availability? 提高可用性? Reduce clutter on web site? 减少网站复杂性? Improve product search algorithms? 改进产品搜索算法? 所以,如大家所见,自从我们停用投诉表以后,用户满意度得到了显著的上升
amazon.com I (ous arrounD) I Bo Hello, Andreas Weigend. We have recommendations for you. (If you're not Andreas Weigend, click All results for: blue pants Search: Amazoncom v for blue pants Refine your Giorgio Sant Angelo silk Dress Slacks Apparel: See all 2,017 items(Rate this item Find blue pants in these categories: Buy new:s49-99 Sale: $7.49 Books(41,731) Apparel (2, 017) apris(The Childrens Place) Sports Outdoors Apparel: See all 2,017 items(Rate this item) a (what is this?) Buy new: 4+4-5e Sale: $6.99 n& Personal Care Sponsored Links: What's this? at is this?)(4 Garden(2) Buy Jeans- The Buckle-- Lucky Brand, Silver, Mavi, Ecko, BKE ww. buckle. com Blue Jeans- Shop for deals on Clothing here! Simply Fast Savings Toys Games (1) www.shopping.com dustrial Supplies(67) S The boys in the sky-blue pants: the men and events at camp Outline提纲 Make decisions based on experiments(A-B tests) 以A-B实验为基础进行决策 p The need for a scientific framework 我们需要一个科学的框架 Methods 方法论 新的 redients for innovation p Revealed vs stated preferences 被揭示vs被说出的 Domain 知识领域 The iterative process of modeling 反复的建模过程 Some insights into online customer behavi 些对在线客户行为的理解 b Levels of analysis and actionability s Personalization ys occ o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 6 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 6 of 14 12 © 2004 by Weigend Associates LLC | www.weigend.com 13 © 2004 by Weigend Associates LLC | www.weigend.com Outline 提纲 • Make decisions based on experiments (A-B tests) 以A-B实验为基础进行决策 The need for a scientific framework 我们需要一个科学的框架 Three ingredients for innovation 革新的三大因素 Revealed vs stated preferences 被揭示 vs 被说出的偏好 • The iterative process of modeling 反复的建模过程 • Some insights into online customer behavior 一些对在线客户行为的理解 Levels of analysis and actionability Personalization vs Occasionalization Behavioral economics Data 数据 Methods 方法论 Domain 知识领域
The Iterative process of Modeling模拟反复过程 Describe Predict 描述 预测和评估 Measure 测量 and Act 决策和行为 14 重新定义 d Associates LLC I www,e世t 1. Define objec 定义目标 股票价格 b Number of items sold 销售数量 b Number of visits 访问量 w Rate of conversion 客户获得 w Customer retention t Customer satisfaction 客户满意度 o 2004 Weigend Associates LLC Handout page 7 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 7 of 14 14 © 2004 by Weigend Associates LLC | www.weigend.com The Iterative Process of Modeling 模拟反复过程 Predict and Evaluate 预测和评估 Measure 测量 Define 重新定义 Describe 描述 Decide and Act 决策和行为 15 © 2004 by Weigend Associates LLC | www.weigend.com 1. Define Objectives 定义目标 Stock price 股票价格 Profit 收益 Number of items sold 销售数量 Number of visits 访问量 Rate of conversion 转化率 Customer acquisition 客户获得 Customer retention 客户保持 Customer satisfaction 客户满意度
2. Measure测量 Custome Customer u Site Site Behavior Interactions Behavi 害户网站交互 网站行为 b Orders Customer service response 客户服务回复 Overall use of the site Free replacement, Refund免费重置,退款) Buying vs selling买者vs卖者 Delivery promised date Searching vs browsing搜案vs浏览 交货日期vs允诺日期 p Customer service contacts 客户服务联系 页面产9 eneration time E-mail, phone电子邮件,电话 Search response 搜索回复 Number of search results 图/目标/形式 E-mail campaigns and responses Satisfaction 电子邮件广告和回复 Amount of Data Created Per Day每天产生的数据量 Comparison ● New data per day 水平 比较 每日新数据 1MB rders 10 cm 10..100MB b Session aggregates 10 b 1.. 10GB k Presentation levelk 10+km k10+ TB 表现水平 "What o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 8 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 8 of 14 16 © 2004 by Weigend Associates LLC | www.weigend.com 2. Measure 测量 Orders 订货 Overall use of the site 网站的综合利用 Buying vs selling 买者vs卖者 Searching vs browsing 搜索vs浏览 Writing reviews, lists, etc. Customer service contacts 客户服务联系 E-mail, phone 电子邮件,电话 Customer service response 客户服务回复 Resolution 结果 (Free replacement, Refund 免费重置,退款) Delivery date vs promised date 交货日期vs允诺日期 Page generation time 页面产生时间 Search response 搜索回复 Number of search results 搜索结果数量 CustomerSite Interactions 客户网站交互 Customer Behavior 客户行为 Site Behavior 网站行为 Surveys 调研 Intentions / Goals / Modalities 意图/目标/形式 Satisfaction 满意度 E-mail campaigns and responses 电子邮件广告和回复 17 © 2004 by Weigend Associates LLC | www.weigend.com Amount of Data Created Per Day 每天产生的数据量 • Level 水平 Customer 客户 Orders 订单 Session aggregates 会话总和 Clicks 点击 Presentation level* 表现水平 *What was displayed, whether or not it was clicked on 无论点击与否,都会呈现 • New data per day 每日新数据 1MB 10… 100MB 1… 10GB 100GB … 1 TB 10+ TB Amount of data Comparison 比较 1 mm 10 cm 10 m 1 km 10+ km
Distribution of visit length 访问时间分布图 clicks per visit? unrecognized non-purchase 法辨认的未购买行为 每次访问有多少点击数? e Gold Box 可以识别的购买行为 10点击数量 number of clicks 4. Building and Evaluating Predictive Models预测模型的建立与评估 b Probability(buy in this visit without discount)vs Prob(buy in this visit with discount) 本次访问用优惠券购物与不用优惠券购物 e Probability (current page is last page requested in this visit) 可能性:该页是本次访问的最后一页 e Use models from different model classes(different statistical assumptions) 利用不同类别的模型(不同的统计假设) Baseline, e. g, Poisson(independent, unconditional 基线,例如,泊松(独立的,无粲件的) P First order Markov L Beginning-of-visit information 访问开始的相关信息 Http-rEferrEr Http转发 Browse搜索与浏览 P Aggregate visit so far(but time ordering ignored) 访问集合(但是忽略定期订购) o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 9 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 9 of 14 Gold Box Webcrawlers Distribution of visit length: 访问时间分布图 How many clicks per visit? 无法辨认的未购买行为 每次访问有多少点击数? 可以识别的未购买行为 可以识别的购买行为 内部的未购买行为 内部的购买行为 点击数量 20 © 2004 by Weigend Associates LLC | www.weigend.com 4. Building and Evaluating Predictive Models 预测模型的建立与评估 • Tasks: Predict, e.g., 目标:预测 Probability (buy in this visit without discount) vs Prob (buy in this visit with discount) 本次访问用优惠券购物与不用优惠券购物 Probability (current page is last page requested in this visit) 可能性:该页是本次访问的最后一页。 • Use models from different model classes (different statistical assumptions) 利用不同类别的模型(不同的统计假设) Baseline, e.g., Poisson (independent, unconditional) 基线, 例如, 泊松 (独立的,无条件的) First order Markov 一阶马尔可夫过程 Beginning-of-visit information 访问开始的相关信息 HTTP-referrer HTTP转发 Search vs Browse 搜索与浏览 Aggregate visit so far (but time ordering ignored) 访问集合(但是忽略定期订购)
Building More Complex Probabilistic Models创建更复杂的概率模型 e Add synthetic variables 增加综合变量 Combine observed variables(automatically generated) 与观察变量结合(自动产生) 增加隐藏变量 不可观察/隐藏状态 e Add relational structure 增加相关结构 b E.g., use information from the products table, rather than only product ID 如产品表而非仅仅产品标识 评估脱离例子分析的准确性 Standa under roc curve ROC曲 Extended relational structure扩展的关系结构 prey.elapsed Mode Performance(Area under ROC curve) First order markov一阶马尔可夫过程 Hidden markov 可见马尔可夫过程513 Basic relational 基础关系 22 Extended relational扩展关系 o 2004 Weigend Associates LLC Informationatwww.weigend.com Handout page 10 of 14
© 2004 Weigend Associates LLC Information at www.weigend.com Handout page 10 of 14 21 © 2004 by Weigend Associates LLC | www.weigend.com Building More Complex Probabilistic Models 创建更复杂的概率模型 Joint work with Bruce D’Ambrosio, CleverSet Inc. 与Bruce D’Ambrosio, CleverSet Inc合作 • Add synthetic variables 增加综合变量 Combine observed variables (automatically generated) 与观察变量结合(自动产生) • Add hidden variables 增加隐藏变量 Unobserved / hidden states 不可观察/隐藏状态 • Add relational structure 增加相关结构 E.g., use information from the products table, rather than only product ID 如产品表而非仅仅产品标识 • Evaluate out-of-sample accuracy 评估脱离例子分析的准确性 Standard: Area under ROC curve ROC曲线下面的区域 22 © 2004 by Weigend Associates LLC | www.weigend.com Extended Relational Structure 扩展的关系结构 Model Performance (Area under ROC curve) First order Markov 一阶马尔可夫过程 .334 Hidden Markov 不可见马尔可夫过程 .513 Basic relational 基础关系 .728 Extended relational 扩展关系 .777