当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

电子科技大学:《大数据分析与挖掘 Big Data Analysis and Mining》课程教学资源(课件讲稿)Lecture 4 Sampling for Big Data

资源类别:文库,文档格式:PDF,文档页数:35,文件大小:3.43MB,团购合买
点击下载完整版文档(PDF)

Lecture 4 Sampling for Big Data

Lecture 4 Sampling for Big Data

What's the value of TT? MC Approximation of Pi 3.14616 8 -0.5 0.0 0.5 0 Areacircle πr2 π #Pointcircle π=4* Areasquare (2r)(2r) =4 #Pointsquare

What’s the value of π?

Outline >Motivation /Benefits >Basics of sampling Inverse Transform sampling ·Rejection sampling ·Importance sampling Markov chain Monte Carlo (MCMC) MH Sampling Gibbs Sampling >Stream sampling Sample >Conclusion

¾Motivation / Benefits ¾Basics of sampling • Inverse Transform sampling • Rejection sampling • Importance sampling • Markov chain Monte Carlo (MCMC) MH Sampling Gibbs Sampling ¾Stream sampling ¾Conclusion Outline

Why Sampling? 10 12 Big data issue 。 Store complexity 2 10 Sample Calculate complexity 35 ● Posterior estimation Expectation estimation Population ● Mean age of people in China

Big data issue • Store complexity • Calculate complexity • … Why Sampling? Posterior estimation • Expectation estimation • …… Mean age of people in China

Bad Sampling Perform your research with bad samples,or just ones that are inaccurately designed,and you will almost certainly get misleading results. Examples:only sample teenagers when querying the mean age of people in China YOUR SAMPLING IS BAD AND YOU SHOULD FEEL BAD

Bad Sampling • Perform your research with bad samples, or just ones that are inaccurately designed, and you will almost certainly get misleading results. • Examples: only sample teenagers when querying the mean age of people in China

Inverse Transform Sampling 00 9 Qo 6 0 P 9 9.0 000 o。0 08 88o 00 00 0 oo 00 Gaussian distribution 0.9 Q.8 yi ® 03 02 0.2 0.1 sample xi sample xi 01 sample xi 90 5 0 10 5 (a).Gaussian CDF (b).Gaussian CDF with o2o (c).Gaussian CDF with o20

௜ ௜ ௜ ௜ ௜ ௜ Gaussian distribution Inverse Transform Sampling

Inverse Transform Sampling >Sampling based on the inverse of Cumulative Distribution Function(CDF) >Method: CDF Sampling: Yi~Uniform(0,1) Xi=CDF-1(Yi) >Drawbacks: Usually,it's hard to get the inverse function

¾Sampling based on the inverse of Cumulative Distribution Function (CDF) ¾Method: ¾Drawbacks: • Usually, it’s hard to get the inverse function Inverse Transform Sampling

Example 1 8x ,if0≤x1 8 0.0 0.2 0.4 0.6 0.8 1.0 ,if0≤u<0.25 V31- ,if0.25≤u≤1 2

Example 1

Example 2 2m2 h(x)= (1-m2)z8, x∈m, '0 ,ifx1 my_invcdf samplepdf m2 H1(w=√1-(1-m2u 0.4 0.50.6 0.7 0.8 0.91.0 1.1 x,N=1000

Example 2

Rejection Sampling Ideas: Accept the samples in the region under the graph of its density function and reject others Rejection Sampling: Proposal distribution Q(x) Xi~Q(X) reject a~Uniform(0,Q(Xi)) Mq(x) if(a≤P(X) accept Accept p(x) else Reject x(i) Acceptance ratio =p(x)/Mg(x)

Ideas: • Accept the samples in the region under the graph of its density function and reject others Proposal distribution Q(x) Rejection Sampling Acceptance ratio = p(x)/Mq(x)

点击下载完整版文档(PDF)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共35页,可试读12页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有