Learning to Hash for Big Data 李武军 LAMDA Group 南京大学计算机科学与技术系 软件新技术国家重,点实验室 Joint work with孔维吴,张东擎,张培超,张巍,过数意 Nov30,2014 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Learning to Hash CS.NJU 1/50
Learning to Hash for Big Data o… LAMDA Group HÆåÆOéÅâÆÜE‚X ^á#E‚I[:¢ø Joint work with öëh, ‹¿ô, ‹á, ‹Ü, LØø Nov 30, 2014 Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 1 / 50
Outline ① Introduction o Problem Definition Existing Methods ②Isotropic Hashing Supervised Hashing with Latent Factor Model Supervised Multimodal Hashing with SCM Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 6 Conclusion Reference 日卡40三4元,重只0 Li (http://cs.nju.edu.cn/lvj) Learning to Hash CS.NJU 2/50
Outline 1 Introduction Problem Definition Existing Methods 2 Isotropic Hashing 3 Supervised Hashing with Latent Factor Model 4 Supervised Multimodal Hashing with SCM 5 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 6 Conclusion 7 Reference Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 2 / 50
Introduction Outline ① Introduction Problem Definition ●Existing Methods Isotropic Hashing Supervised Hashing with Latent Factor Model Supervised Multimodal Hashing with SCM Multiple-Bit Quantization oDouble-Bit Quantization Manhattan Quantization 6 Conclusion Reference 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lvj) Learning to Hash CS.NJU 3/50
Introduction Outline 1 Introduction Problem Definition Existing Methods 2 Isotropic Hashing 3 Supervised Hashing with Latent Factor Model 4 Supervised Multimodal Hashing with SCM 5 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 6 Conclusion 7 Reference Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 3 / 50
Introduction Problem Definition Nearest Neighbor Search(Retrieval) oGiven a query point g,return the points closest(similar)to g in the database(e.g.images). o Underlying many machine learning,data mining,information retrieval problems Challenge in Big Data Applications: o Curse of dimensionality Storage cost ●Query speed 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS.NJU 4/50
Introduction Problem Definition Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database(e.g. images). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 4 / 50
Introduction Problem Definition Similarity Preserving Hashing h(Statue of Liberty)= h(Napoleon)= h (Napoleon)= 10001010 01100001 011001Q1 flipped bit Should be very different Should be similar 0Q0 Li (http://cs.nju.edu.cn/lvj) Learning to Hash CS.NJU 5/50
Introduction Problem Definition Similarity Preserving Hashing Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 5 / 50
Introduction Problem Definition Reduce Dimensionality and Storage Cost Gist vector Binary reduction 10 million images 20 GB 160MB 口卡+得二4元互)Q0 Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS.NJU 6/50
Introduction Problem Definition Reduce Dimensionality and Storage Cost Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 6 / 50
Introduction Problem Definition Querying Hamming distance: 。101101110,00101101la=3 。l11011,01011lg=1 Query Image Dataset ,王○Q0 Li (http://cs.nju.edu.cn/lvj) Learning to Hash CS.NJU 7/50
Introduction Problem Definition Querying Hamming distance: ||01101110, 00101101||H = 3 ||11011, 01011||H = 1 Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 7 / 50
Introduction Problem Definition Querying 是 口卡得三4元互Q0 Li (http://cs.nju.edu.cn/lvj) Learning to Hash CS.NJU 8/50
Introduction Problem Definition Querying Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 8 / 50
Introduction Problem Definition Querying 口卡得三4元互Q0 Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS.NJU 9 /50
Introduction Problem Definition Querying Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 9 / 50
Introduction Problem Definition Fast Query Speed o By using hashing scheme,we can achieve constant or sub-linear search time complexity. Exhaustive search is also acceptable because the distance calculation cost is cheap now. 日卡三4元,互Q0 Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS.NJU 10/50
Introduction Problem Definition Fast Query Speed By using hashing scheme, we can achieve constant or sub-linear search time complexity. Exhaustive search is also acceptable because the distance calculation cost is cheap now. Li (http://cs.nju.edu.cn/lwj) Learning to Hash CS, NJU 10 / 50