Big Data Machine Learning 大数据机器学习 李武军 LAMDA Group 南京大学计算机科学与技术系 软件新技术国家重点实验室 liwujun@nju.edu.cn 0ct31,2014 日卡*24元至)风0 Li (http://cs.nju.edu.cn/lwj) Big Learning CS.NJU 1/79
Big Data Machine Learning åÍ‚ÅÏÆS o… LAMDA Group HÆåÆOéÅâÆÜE‚X ^á#E‚I[:¢ø liwujun@nju.edu.cn Oct 31, 2014 Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 1 / 79
Outline Introduction ②Learning to Hash o Isotropic Hashing o Supervised Hashing with Latent Factor Models o Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing ④Stochastic Learning o Distributed Stochastic ADMM for Matrix Factorization Conclusion 日卡40三4元,重只0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 2/79
Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 2 / 79
Introduction Outline 0 Introduction ②Learning to Hash Isotropic Hashing o Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM o Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction o Distributed Power-Law Graph Computing Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization ⑤Conclusion 4日卡0,·三·4元至)及0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 3/79
Introduction Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 3 / 79
Introduction Big Data Big data has attracted much attention from both academic and industry. Facebook:750 Million users Flickr:6 Billion photos Wal-Mart:267 Million items/day;4PB data warehouse oSloan Digital Sky Survey:New Mexico telescope captures 200 GB image data/day Science FOURTH PARADIGM data Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 4/79
Introduction Big Data Big data has attracted much attention from both academic and industry. Facebook: 750 Million users Flickr: 6 Billion photos Wal-Mart: 267 Million items/day; 4PB data warehouse Sloan Digital Sky Survey: New Mexico telescope captures 200 GB image data/day Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 4 / 79
Introduction Definition of Big Data o Gartner(2012):"Big data is high volume,high velocity,and/or high variety information assets that require new forms of processing to enable enhanced decision making,insight discovery and process optimization."("3Vs") International Data Corporation (IDC)(2011):"Big data technologies describe a new generation of technologies and architectures,designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture,discovery,and/or analysis."("4Vs") McKinsey Global Institute(MGI)(2011):"Big data refers to datasets whose size is beyond the ability of typical database software tools to capture,store, manage,and analyze.' Why not hot until recent years? 。Big data:金矿 。Cloud computing:采矿技术 。Big data machine learning:治金技术 日卡*24元至)风0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 5/79
Introduction Definition of Big Data Gartner (2012): “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” (“3Vs”) International Data Corporation (IDC) (2011): “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” (“4Vs”) McKinsey Global Institute (MGI) (2011): “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Why not hot until recent years? Big data: 7¶ Cloud computingµÊ¶E‚ Big data machine learningµé7E‚ Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 5 / 79
Introduction Big Data Machine Learning o Definition:perform machine learning from big data. o Role:key for big data Ultimate goal of big data processing is to mine value from data. Machine learning provides fundamental theory and computational techniques for big data mining and analysis. 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 6/79
Introduction Big Data Machine Learning Definition: perform machine learning from big data. Role: key for big data Ultimate goal of big data processing is to mine value from data. Machine learning provides fundamental theory and computational techniques for big data mining and analysis. Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 6 / 79
Introduction Challenge oStorage:memory and disk Computation:CPU o Communication:network 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Big Leamning CS.NJU 7/79
Introduction Challenge Storage: memory and disk Computation: CPU Communication: network Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 7 / 79
Introduction Our Contribution 。Learning to hash(哈希学习):memory/disk/cpu/communication 。Distributed learning(分布式学习)memory/disk/cpu; but increase communication cost ●Stochastic learning(随机学习):memory/disk/cpu 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Big Learning CS.NJU 8/79
Introduction Our Contribution Learning to hash (MFÆS): memory/disk/cpu/communication Distributed learning (©Ÿ™ÆS): memory/disk/cpu; but increase communication cost Stochastic learning (ëÅÆS): memory/disk/cpu Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 8 / 79
Learning to Hash Outline Introduction ② Learning to Hash o Isotropic Hashing Supervised Hashing with Latent Factor Models o Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization ⑤Conclusion 4日卡0,·工·4元,至)及0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 9/79
Learning to Hash Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Supervised Hashing with Latent Factor Models Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 9 / 79
Learning to Hash Nearest Neighbor Search (Retrieval) oGiven a query point g,return the points closest(similar)to g in the database(e.g.images). o Underlying many machine learning,data mining,information retrieval problems Challenge in Big Data Applications: o Curse of dimensionality Storage cost ●Query speed 日卡*2元至)风0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 10/79
Learning to Hash Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database(e.g. images). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 10 / 79