Lecture 1 Overview Data Analysis and Data Mining Dr.李晓瑜Xiaoyu Li Email:xiaoyuuestc@uestc.edu.cn http://blog.sciencenet.cn/u/uestc2014xiaoyu 2019-Spring SunData Group http://www.sundatagroup.org School of Information and Software Engineering,UESTC 1966 Copyright2019 by Xiaoyu Li
Dr.李晓瑜 Xiaoyu Li Email:xiaoyuuestc@uestc.edu.cn http://blog.sciencenet.cn/u/uestc2014xiaoyu 2019-Spring Lecture 1 Overview Data Analysis and Data Mining SunData Group http://www.sundatagroup.org/ School of Information and Software Engineering, UESTC Copyright © 2019 by Xiaoyu Li. 1
C3at3e美0是10 Content (3H) ●1.1What's big data? 1.2 Overview of data analysis 1.3 Overview of data mining 1.4 Make requirement for different professional applications 3 Copyright 2019 by Xiaoyu Li
Content(3H) 1.1 What’s big data? 1.2 Overview of data analysis 1.3 Overview of data mining 1.4 Make requirement for different professional applications Copyright © 2019 by Xiaoyu Li. 3
sunData Groun Reference ·Text Book 数据挖掘 数据挖掘 概念与技术 实用机器学习技术 Data Mining,Jiawei Han,Micheline Kamber and Jian Pei,Mechanical industry press(2012) DATA MINING ·Reference Book 1)Tamhane,Ajit C.,and Dorothy D.Dunlop Statistics and Data Analysis:From Elementary to Intermediate.Prentice Hall,1999. 集体智慧 2)统计学习方法(李航) 佛计学习方法 编程 。Couresa 1)Machine Learning (Andrew Ng) 2)Data Mining (Stanford) nn出 ORE了 型4出 3)Statistical Thinking and Data Analysis (MIT) 4 Copyright 2019 by Xiaoyu Li
Reference Copyright © 2019 by Xiaoyu Li. 4 Text Book Data Mining, Jiawei Han, Micheline Kamber and Jian Pei, Mechanical industry press(2012) Reference Book 1)Tamhane, Ajit C., and Dorothy D. Dunlop. Statistics and Data Analysis: From Elementary to Intermediate. Prentice Hall, 1999. 2)统计学习方法(李航) Couresa 1)Machine Learning(Andrew Ng) 2)Data Mining(Stanford) 3)Statistical Thinking and Data Analysis (MIT)
GunData Groun Target 1 Know the characteristics of big data; 2 Clear how to get the data analysis requirements; 3 Know the differences and correlations between data analysis and data mining. 5 Copyright 2019 by Xiaoyu Li
Target 1 Know the characteristics of big data; 2 Clear how to get the data analysis requirements; 3 Know the differences and correlations between data analysis and data mining. Copyright © 2019 by Xiaoyu Li. 5
Big Data BIG DATA ERA IS COMING 6 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 6 Big Data
1.1 What's big data? 7 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 7 1.1 What’s big data?
(1)Background Global Information Storage Capacity 2007 ANALOG 19 exabytes in optimally compressed bytes .Paper,film,audiotape and vinyl:6% Analogvideotapes (VHS,etc):94%ANALOG Portable media,flash drives:2% Portable hard disks:2.4% DIGITAL CDs and minidisks 6.8% Computer servers and mainframes:8.9% 2000 Digital tape:11.8% 1986 1993 ANALOG 2.6 exabytes ANALOG STORAGE DVD/Blu-ray:22.8% DIGITAL DIGITAL STORAGE 0.02 exabytes PC hard disks:44.5% 2002: 123 billion gigabytes “beginning of the digital age" 50% %digital: Others:1%(incl.chip cards memory cards floppy disks mobile phones,PDAs,cameras/camcorders,video games) 1% 3% 25% 94% DIGITAL 8 Source:Hilbert,M.,Lopez,P.(2011).The World's Technological Capacityto Store,Communicate,and 280 exabytes Compute Information.Science,332(6025),60-65.http://www.martinhilbert.net/WorldinfoCapacity.html
8 (1) Background
(2)Development Media/Entertainm Healthcare 6 BILLION = 87% nro wadwide 时hewn时sgpd 1.01 BILLION 604 MILLION a00045年0南d0 g1-in mathy temmobis de在5 90% 400 MILLION = 84 MILLION Gbutidihao01.02-2000cCbutadam> DNA fMRI/DTI Messenger Watch oFanenta TCCAGGTAGTGGACGTTACACCTAc CATGGCTCCTCCACCTAACCAGCAG 6M3:W代2hS Gene GTATGGACAGCAATATGGGCAACAA 根为有用y物 90n05000t女0t ACCAGGTccrcccccTArGGcTTAT f14714:34k12o正台Mn2 BIG Sequence Industry DATA E-commerce "o w Sensor Manufacture Wall Mart:2.5 PB/hour Stock Data ATA 9 Copyright 2019 by Xiaoyu Li. *Note:some pictures derived from internet
Copyright © 2019 by Xiaoyu Li. 9 (2) Development fMRI/ DTI Stock Data BIG DATA Media/Entertainm et Wall Mart: 2.5 PB/hour Industry Healthcare DNA *Note: some pictures derived from internet E-commerce Gene Sequence Messenger Watch Sensor Manufacture
(3)Data Stream Internet Surveillance SRAM SPAM FILTER Spam Filtering DATA Network Intrusion Industry STREAM Mobile Smart Sensor Phone *Note:some pictures derived from internet ATA 10 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 10 (3) Data Stream DATA STREAM Internet Industry Surveillance Sensor Network Intrusion Smart Phone Spam Filtering Mobile *Note: some pictures derived from internet
(4)Useful Applications 圭 中国南方电网 国家电网 STATE GRID 中石C OIL opec 中国石油 中海石油 ATA 11 Copyright 2019 by Xiaoyu Li
Copyright © 2019 by Xiaoyu Li. 11 (4) Useful Applications