大数据机器学习 李武军 LAMDA Group 南京大学计算机科学与技术系 软件新技术国家重点实验室 liwujun@nju.edu.cn Nov26,2015 日卡*2元至)Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 1/115
åÍ‚ÅÏÆS o… LAMDA Group HÆåÆOéÅâÆÜE‚X ^á#E‚I[:¢ø liwujun@nju.edu.cn Nov 26, 2015 Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 1 / 115
Outline ①Introduction Learning to Hash o Isotropic Hashing o Scalable Graph Hashing with Feature Transformation o Supervised Hashing with Latent Factor Models o Column Sampling based Discrete Supervised Hashing o Deep Supervised Hashing with Pairwise Labels o Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing ④Stochastic Learning o Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization Conclusion 口卡+得三4元互)Q0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 2/115
Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Scalable Graph Hashing with Feature Transformation Supervised Hashing with Latent Factor Models Column Sampling based Discrete Supervised Hashing Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 2 / 115
Introduction Outline ①Introduction Learning to Hash Isotropic Hashing Scalable Graph Hashing with Feature Transformation Supervised Hashing with Latent Factor Models Column Sampling based Discrete Supervised Hashing Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization Conclusion +日卡+得,三4元互)风0 Li (http://cs.nju.edu.cn/lvj) Big Learning CS.NJU 3 /115
Introduction Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Scalable Graph Hashing with Feature Transformation Supervised Hashing with Latent Factor Models Column Sampling based Discrete Supervised Hashing Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 3 / 115
Introduction Big Data Big data has attracted much attention from both academia and industry. o Facebook:750 million users Flickr:6 billion photos Wal-Mart:267 million items/day;4PB data warehouse oSloan Digital Sky Survey:New Mexico telescope captures 200 GB image data/day Science FOURTH PARADIGM data Li (http://cs.nju.edu.cn/lwj) Big Leamning CS.NJU 4/115
Introduction Big Data Big data has attracted much attention from both academia and industry. Facebook: 750 million users Flickr: 6 billion photos Wal-Mart: 267 million items/day; 4PB data warehouse Sloan Digital Sky Survey: New Mexico telescope captures 200 GB image data/day Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 4 / 115
Introduction Definition of Big Data o Gartner(2012):"Big data is high volume,high velocity,and/or high variety information assets that require new forms of processing to enable enhanced decision making,insight discovery and process optimization."("3Vs") International Data Corporation (IDC)(2011):"Big data technologies describe a new generation of technologies and architectures,designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture,discovery,and/or analysis."("4Vs") McKinsey Global Institute(MGI)(2011):"Big data refers to datasets whose size is beyond the ability of typical database software tools to capture,store, manage,and analyze.' Why not hot until recent years? 。Big data:金矿 。Cloud computing:采矿技术 。Big data machine learning:治金技术 日卡*2元至)Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 5 /115
Introduction Definition of Big Data Gartner (2012): “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” (“3Vs”) International Data Corporation (IDC) (2011): “Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.” (“4Vs”) McKinsey Global Institute (MGI) (2011): “Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Why not hot until recent years? Big data: 7¶ Cloud computingµÊ¶E‚ Big data machine learningµé7E‚ Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 5 / 115
Introduction Big Data Machine Learning o Definition:perform machine learning from big data. o Role:key for big data Ultimate goal of big data processing is to mine value from data. Machine learning provides fundamental theory and computational techniques for big data mining and analysis. 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 6/115
Introduction Big Data Machine Learning Definition: perform machine learning from big data. Role: key for big data Ultimate goal of big data processing is to mine value from data. Machine learning provides fundamental theory and computational techniques for big data mining and analysis. Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 6 / 115
Introduction Challenge oStorage:memory and disk oComputation:CPU o Communication:network 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Big Leamning CS.NJU 7 /115
Introduction Challenge Storage: memory and disk Computation: CPU Communication: network Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 7 / 115
Introduction Our Contribution 。Learning to hash(哈希学习):memory/disk/cpu/communication 。Distributed learning(分布式学习)memory/disk/cpu; but increase communication cost ●Stochastic learning(随机学习):memory/disk/cpu 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Big Learning CS.NJU 8/115
Introduction Our Contribution Learning to hash (MFÆS): memory/disk/cpu/communication Distributed learning (©Ÿ™ÆS): memory/disk/cpu; but increase communication cost Stochastic learning (ëÅÆS): memory/disk/cpu Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 8 / 115
Learning to Hash Outline Introduction ②Learning to Hash o Isotropic Hashing Scalable Graph Hashing with Feature Transformation o Supervised Hashing with Latent Factor Models o Column Sampling based Discrete Supervised Hashing o Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization Conclusion 口卡得,三4元互Q0 Li (http://cs.nju.edu.cn/lvj) Big Leaming CS.NJU 9/115
Learning to Hash Outline 1 Introduction 2 Learning to Hash Isotropic Hashing Scalable Graph Hashing with Feature Transformation Supervised Hashing with Latent Factor Models Column Sampling based Discrete Supervised Hashing Deep Supervised Hashing with Pairwise Labels Supervised Multimodal Hashing with SCM Multiple-Bit Quantization 3 Distributed Learning Coupled Group Lasso for Web-Scale CTR Prediction Distributed Power-Law Graph Computing 4 Stochastic Learning Fast Asynchronous Parallel Stochastic Gradient Descent Distributed Stochastic ADMM for Matrix Factorization 5 Conclusion Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 9 / 115
Learning to Hash Nearest Neighbor Search (Retrieval) Given a query point g,return the points closest(similar)to g in the database (e.g.,image retrieval). o Underlying many machine learning,data mining,information retrieval problems Challenge in Big Data Applications: o Curse of dimensionality Storage cost ●Query speed 日卡三4元,互)Q0 Li (http://cs.nju.edu.cn/lwj) Big Leaming CS.NJU 10/115
Learning to Hash Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database (e.g., image retrieval). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li (http://cs.nju.edu.cn/lwj) Big Learning CS, NJU 10 / 115