Learning to Hash with its Application to Big Data Retrieval and Mining 李武军 Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai,China Joint work with孔维吴,张东擎,过敏意 Dec21,2013 日卡三4元,互Q0 i (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 1/49
Learning to Hash with its Application to Big Data Retrieval and Mining o… Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai, China Joint work with öëh, ‹¿ô, LØø Dec 21, 2013 Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 1 / 49
Outline Introduction Problem Definition ●Existing Methods ② Isotropic Hashing Model ●Learning o Experiment Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization ④ Conclusion Reference +日卡+得三元互)Q0¥ Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 2 /49
Outline 1 Introduction Problem Definition Existing Methods 2 Isotropic Hashing Model Learning Experiment 3 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 4 Conclusion 5 Reference Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 2 / 49
Introduction Outline Introduction Problem Definition ●Existing Methods Isotropic Hashing o Model e Learning o Experiment Multiple-Bit Quantization Double-Bit Quantization o Manhattan Quantization Conclusion Reference +日卡+得三元互)Q0¥ Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 3 /49
Introduction Outline 1 Introduction Problem Definition Existing Methods 2 Isotropic Hashing Model Learning Experiment 3 Multiple-Bit Quantization Double-Bit Quantization Manhattan Quantization 4 Conclusion 5 Reference Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 3 / 49
Introduction Problem Definition Nearest Neighbor Search(Retrieval) oGiven a query point g,return the points closest(similar)to g in the database(e.g.images). o Underlying many machine learning,data mining,information retrieval problems Challenge in Big Data Applications: o Curse of dimensionality Storage cost ●Query speed 日卡三4元,互Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 4 /49
Introduction Problem Definition Nearest Neighbor Search (Retrieval) Given a query point q, return the points closest (similar) to q in the database(e.g. images). Underlying many machine learning, data mining, information retrieval problems Challenge in Big Data Applications: Curse of dimensionality Storage cost Query speed Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 4 / 49
Introduction Problem Definition Similarity Preserving Hashing h(Statue of Liberty)= h(Napoleon)= h (Napoleon)= 10001010 01100001 011001Q1 flipped bit Should be very different Should be similar 0Q0 f(http:/aw.cs:sjtu.edu.cn/-1iu时jun】 Learning to Hash CSE,SJTU 5/49
Introduction Problem Definition Similarity Preserving Hashing Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 5 / 49
Introduction Problem Definition Reduce Dimensionality and Storage Cost Gist vector Binary reduction 1 million images 2 GB 16 MB 口卡+得二4元互)Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE.SJTU 6/49
Introduction Problem Definition Reduce Dimensionality and Storage Cost Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 6 / 49
Introduction Problem Definition Querying Hamming distance: 。101101110,00101101la=3 。l11011,01011lg=1 Query Image Dataset ,30Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CsE,S」TU7/49
Introduction Problem Definition Querying Hamming distance: ||01101110, 00101101||H = 3 ||11011, 01011||H = 1 Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 7 / 49
Introduction Problem Definition Querying 是 口卡得三4元互Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 8/49
Introduction Problem Definition Querying Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 8 / 49
Introduction Problem Definition Querying 口卡得三4元互Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE,SJTU 9/49
Introduction Problem Definition Querying Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 9 / 49
Introduction Problem Definition Fast Query Speed o By using hashing scheme,we can achieve constant or sub-linear search time complexity. Exhaustive search is also acceptable because the distance calculation cost is cheap now. 日卡三4元,互Q0 Li (http://www.cs.sjtu.edu.cn/-livujun Learning to Hash CSE.SJTU 10/49
Introduction Problem Definition Fast Query Speed By using hashing scheme, we can achieve constant or sub-linear search time complexity. Exhaustive search is also acceptable because the distance calculation cost is cheap now. Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 10 / 49