rc 电子料拔女学 University of Electronic Science and Technology of China 1956 Big Data Analytics and Mining Junming Shao Data Mining Lab, DM Big Data Research Center,UESTC LESS IS MORE Email:junming.shao@gmail.com /966 Http://dm.uestc.edu.cn
Data Mining Lab, Big Data Research Center, UESTC Email:junming.shao@gmail.com Http://dm.uestc.edu.cn Big Data Analytics and Mining Junming Shao
Content >Introduction to Big Data Mining Foundation of Data Mining >Hashing Sampling 格 Data Stream Mining >Graph Mining UeSTC Hadoop-Spark /956
Content Introduction to Big Data Mining Foundation of Data Mining Hashing Sampling Data Stream Mining Graph Mining Hadoop-Spark
Prerequisites √Basic Algorithms Probability and Statistic (Probability,Bayes,etc.) Linear algebra (Matrix Theory) Programming (Java/C++/Pathon,etc.) Database systems (SQL,Relational database) 409 /956 7
Prerequisites Basic Algorithms Probability and Statistic (Probability, Bayes, etc.) Linear algebra (Matrix Theory) Programming (Java/C++/Pathon, etc.) Database systems (SQL, Relational database) 7
Classroom Expectations Follow classroom rules A WARNING Try your best on every in- class activity,assignment, and test Λ YOUR BEHAVIOR SAYS A LOT ABOUT WHO YOU ARE
Classroom Expectations • Follow classroom rules • Try your best on every inclass activity, assignment, and test
Evaluation Closed-book Final-term Examination UeSTC 49 /95
Evaluation Closed-book Final-term Examination
Lecture ONE Introduction to Big Data Mining 格货 UeSTC 409 /956
Lecture ONE Introduction to Big Data Mining
What era do we live in NOW?
What era do we live in NOW?
Media/Entertainmet Healthcare 6 BILLION 87% worldwide o the wertd's pooulation 1.01 BILLION = 604 MILLION ursog-inmot fm 90% o创the world's data has been 400 MILLION 84 MILLION reaed inthe pstwoyears per day usersaccss wtter va mobile -eDgt neet> dabo-t> DNA fMRI/DTI Messenger Watch gPatientx Batnd> sfkationomrespondenzc/ksfkbar TCCAGGTAGTGGACGTTACACCTAC CATGGCTCCtccAcCTAACCAGCAG Gene cSineHW-PCqWshana> GTATGGACAGCAATATGGGCAACAA dsertanexwmay Erstdlng:Datum05.08.2009cEntdlngstotun> BIG ACCAGGTccrcccccTArGGctTAt Sequence Egtelun378D14:34:12sEnstense Industry DATA E-commerce 争 性的海eM场.到O阳D燃 一信-31%=0un一wB050g4% Sensor Manufacture Wall Mart:2.5 PB/hour Stock Data *Note:some pictures derived from Internet
fMRI/ DTI Stock Data BIG DATA Media/Entertainmet Wall Mart: 2.5 PB/hour Industry Healthcare DNA *Note: some pictures derived from Internet E-commerce Gene Sequence Messenger Watch Sensor Manufacture
“Era of Big Data” Examples Flickr(3 billion photos) YouTube(83M videos) Web(10B videos watched mo.) Digital photos(500 billion year) All broadcast(70,000TB year) ■ Yahoo!Vebmap(3 trillion links,300TB compressed,5PB disk) ■ Human genome(2-30TB uncomp.)
14 Flickr (3 billion photos) YouTube (83M videos) Web (10B videos watched / mo.) Digital photos (500 billion / year) All broadcast (70,000TB / year) Yahoo! Webmap (3 trillion links, 300TB compressed, 5PB disk) Human genome (2-30TB uncomp.) Examples “Era of Big Data