Computing/Modeling with Big Data Xindong wu(吴信东) 中国·合肥工业大学计算机与信息学院; Department of Computer Science University of Vermont, USA
Computing/Modeling with Big Data Xindong Wu (吴信东) 中国 · 合肥工业大学计算机与信息学院; Department of Computer Science University of Vermont, USA
Outline The Era of Big Data 2 Big Data Character istics 3 A Big Data Processing Framework Streaming Data and Streaming Features 5 Concluding Remarks
1 The Era of Big Data 2 Big Data Characteristics 3 A Big Data Processing Framework 4 Streaming Data and Streaming Features Outline 5 Concluding Remarks 2
ICDM 13 Panel: Data Mining with Big data Panel Chair: Xindong wu Panelists: what useful content our Big Data: a hot topic, bu Chris Clifton (NSF Purdue) What new aspects? or is Vipin kumar(minnesota it just data mining FIEEE, FACM, FAAAS) · How does data mining Jian Pei(TKDE EiC, Canada, FIEEE chans nge with Big Data? Bhavani Thuraisingham What should data miners CUTDallas, Security, FIEEE, FAAAS) Geoff Webb(DMKD EiC, Australia) do to cope with these Zhi-Hua zhou changes (Nanjing, China, FIEEE Big Data
ICDM ’13 Panel: Data Mining with Big Data Panel Chair: Xindong Wu Panelists: • Chris Clifton (NSF & Purdue) • Vipin Kumar (Minnesota, FIEEE,FACM,FAAAS) • Jian Pei (TKDE EiC, Canada, FIEEE) • Bhavani Thuraisingham (UTDallas, Security, FIEEE, FAAAS) • Geoff Webb (DMKD EiC, Australia) • Zhi-Hua Zhou (Nanjing,China,FIEEE) • Big Data: a hot topic, but what useful content? • What new aspects? or is it just data mining? • How does data mining change with Big Data? • What should data miners do to cope with these changes? Big Data
Big Data, from 70s to Now, and 2046 The 1st International Conference on Very Large Data Bases (September 22-24, 1975, Framingham, MA, USA Very large big? The first Er model paper, QBE XLDB -EXtremely Large Databases and Data Management started on october 25, 2007 ·? LDB in2046? ULDB-Upmost Large Databases O Cent 01: being big is relative, going big is a deterministic trend Data mining: keep evolving J. Pei: Big Data Analytics 101
Big Data, from 70s to Now, and 2046 • The 1st International Conference on Very Large Data Bases (September 22-24, 1975, Framingham, MA, USA) – Very large = big? – The first ER model paper, QBE, … • XLDB – Extremely Large Databases and Data Management, started on October 25, 2007 • ?LDB in 2046? – ULDB – Upmost Large Databases ☺ • Cent 01: being big is relative, going big is a deterministic trend • Data mining: keep evolving J. Pei: Big Data Analytics 101 4
Some comments on big data David hand Imperial College, London David Hand: Some comments on big data December 2013
Some comments on big data David Hand Imperial College, London David Hand: Some comments on big data, December 2013
The power law theorem of data set size The number of data sets of size n is inversely proportional to n There are vastly more small data sets than very large ones So small data sets are likely to have a much larger impact on the world than big data sets David Hand: Some comments on big data December 2013
The power law theorem of data set size: • The number of data sets of size n is inversely proportional to n • There are vastly more small data sets than very large ones • So small data sets are likely to have a much larger impact on the world than big data sets David Hand: Some comments on big data, December 2013
No-one actually wants data What people want are answers Which may be extracted from data So data are only half the answer The other half is statistics, data mining machine learning and other data analytic sciplines David Hand: Some comments on big data December 2013
No-one actually wants data • What people want are answers • Which may be extracted from data • So data are only half the answer • The other half is statistics, data mining, machine learning, and other data analytic disciplines David Hand: Some comments on big data, December 2013
The manure heap theorem of data discoveries The probability of finding a gold coin in a heap of manure tends towards 1 as the size of the heap tends to infinity. (This theorem is false) David Hand: Some comments on big data December 2013
The manure heap theorem of data discoveries The probability of finding a gold coin in a heap of manure tends towards 1 as the size of the heap tends to infinity. (This theorem is false) David Hand: Some comments on big data, December 2013
0100 00100 Data Science not just for Big data Gregory piatetsky @kdnuggets nuggets Analytics, Big Data. Data mining, and data Science resources o KDnuggets 2013
Data Science not just for Big Data Gregory Piatetsky, @kdnuggets Analytics, Big Data, Data Mining, and Data Science Resources © KDnuggets 2013 9
What do we call it? Statistics, 1830 Same Core ldea Data mining, 1980 Finding Useful Knowledge Discovery in Patterns in Data Data(KDD),1989 Business analytics, 1997 Predictive analytics, 2002 Data analytics, 2011 Different · Data science,2011 Empl hasis Big Data, 2012 o KDnuggets 2013
What do we call it? • Statistics, 1830- • Data mining, 1980- • Knowledge Discovery in Data (KDD), 1989- • Business Analytics, 1997- • Predictive Analytics, 2002- • Data Analytics,2011- • Data Science, 2011- • Big Data, 2012 - © KDnuggets 2013 10 Same Core Idea: Finding Useful Patterns in Data Different Emphasis