正在加载图片...
IEEE TRANSACTIONS ON IMAGE PROCESSING.VOL.XX,NO.X.XXX 2019 1 Discrete Latent Factor Mmodel for Cross-Modal Hashing Qing-Yuan Jiang,Wu-Jun Li,Member;IEEE Abstract-Due to its storage and retrieval efficiency, representation,the storage cost can be dramatically reduced, cross-modal hashing (CMH)has been widely used for and furthermore we can achieve constant or sub-linear search cross-modal similarity search in many multimedia applications. According to the training strategy,existing CMH methods can be speed which is much faster than the search speed in the mainly divided into two categories:relaxation-based continuous original space 12],13],24],25]. methods and discrete methods.In general,the training of Early hashing methods are mainly proposed for uni-modal relaxation-based continuous methods is faster than discrete meth- data to perform uni-modal similarity search.In recent years, ods,but the accuracy of relaxation-based continuous methods is with the explosive growing of multimedia data in real ap- not satisfactory.On the contrary,the accuracy of discrete meth- plications,multi-modal similarity search has attracted a lot ods is typically better than relaxation-based continuous methods, but the training of discrete methods is very time-consuming.In of attention.For example,given a text query,a multi-modal this paper,we propose a novel CMH method,called discrete similarity search system can return the nearest images or latent factor model based cross-modal hashing (DLFH),for cross videos in the database.To achieve an efficient performance for modal similarity search.DLFH is a discrete method which can large-scale problems,multi-modal hashing (MMH)has been directly learn the binary hash codes for CMH.At the same time,the training of DLFH is efficient.Experiments show that proposed for multi-modal search [26]-[28]. DLFH can achieve significantly better accuracy than existing Existing MMH methods can be divided into two major methods,and the training time of DLFH is comparable to that categories:multi-source hashing (MSH)[15],[2931]and of relaxation-based continuous methods which are much faster cross-modal hashing(CMH)[26].[27].[32]-[34].MSH meth- than existing discrete methods. ods aim to learn binary hash codes by utilizing information Index Terms-Approximate nearest neighbor,cross-modal re- from multiple modalities for each point.In other words,all trieval,hashing,multimedia. these multiple modalities should be observed for all data points including the query points and those in database under MSH settings.Because it's usually difficult to observe all the I.INTRODUCTION modalities in many real applications,the application scenarios EAREST neighbor (NN)search plays a fundamental role for MSH methods are limited.Unlike MSH methods,CMH in many areas including machine learning,information methods usually require only one modality for a query point retrieval,computer vision and so on.In many real applications, to perform search in a database with other modalities.The there is no need to return exact nearest neighbors for every application scenarios for CMH are more flexible than those given query and approximate nearest neighbor (ANN)is for MSH.For example,CMH can perform text-to-image or enough to achieve satisfactory performance [1]-[3].Because image-to-text retrieval tasks in real applications.Hence,CMH ANN search might be much faster than exact NN search,ANN has gained more attention than MSH [26],[27]. search has become an active research topic with a wide range A lot of CMH methods have been proposed in recent of applications especially for large-scale problems [1]-[3]. years.Based on whether supervised information is used Among existing ANN search methods,hashing methods or not during training procedure,existing CMH methods have attracted much more attention due to their storage and can be further divided into two categories:unsupervised retrieval efficiency in real applications [4]-[23].The goal of CMH and supervised CMH.Unsupervised CMH methods hashing is to embed data points from the original space into directly explore data features without supervised informa- a Hamming space where the similarity is preserved.More tion to learn binary codes (or hash functions).Represen- specifically,in the Hamming space,each data point will tative unsupervised CMH methods include canonical cor- be represented as a binary code.Based on the binary code relation analysis iterative quantization (CCA-ITQ)[6],col- lective matrix factorization hashing (CMFH)[27],alternat- Manuscript received July 20,2018;revised December 30,2018;accepted ing co-quantization (ACQ)[35]and unsupervised generative January 22,2019.This work was supported in part by the NSFC-NRF Joint Research Project under Grant 61861146001.and in part by the NSFC adversarial cross-modal hashing (UGACH)[36].Supervised under Grant 61472182.The associate editor coordinating the review of this CMH tries to learn the hash function by utilizing supervised manuscript and approving it for publication was Prof.Xiaochun Cao.(Cor- responding author:Wu-Jun Li) information.As supervised CMH methods can incorporate All authors are with the National Key Laboratory for Novel Software semantic labels to mitigate the semantic gap,supervised CMH Technology,Collaborative Innovation Center of Novel Software Technology methods can achieve better accuracy than unsupervised CMH and Industrialization,Department of Computer Science and Technology,Nan- methods.Representative supervised CMH methods include jing University,Nanjing 210023,China (E-mail:jiangqy@lamda.nju.edu.cn: iwujun@nju.edu.cn). multi-modal latent binary embedding(MLBE)[37],semantic correlation maximization (SCM)[26],semantics preservingIEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. X, XXX 2019 1 Discrete Latent Factor Model for Cross-Modal Hashing Qing-Yuan Jiang, Wu-Jun Li, Member, IEEE Abstract—Due to its storage and retrieval efficiency, cross-modal hashing (CMH) has been widely used for cross-modal similarity search in many multimedia applications. According to the training strategy, existing CMH methods can be mainly divided into two categories: relaxation-based continuous methods and discrete methods. In general, the training of relaxation-based continuous methods is faster than discrete meth￾ods, but the accuracy of relaxation-based continuous methods is not satisfactory. On the contrary, the accuracy of discrete meth￾ods is typically better than relaxation-based continuous methods, but the training of discrete methods is very time-consuming. In this paper, we propose a novel CMH method, called discrete latent factor model based cross-modal hashing (DLFH), for cross modal similarity search. DLFH is a discrete method which can directly learn the binary hash codes for CMH. At the same time, the training of DLFH is efficient. Experiments show that DLFH can achieve significantly better accuracy than existing methods, and the training time of DLFH is comparable to that of relaxation-based continuous methods which are much faster than existing discrete methods. Index Terms—Approximate nearest neighbor, cross-modal re￾trieval, hashing, multimedia. I. INTRODUCTION N EAREST neighbor (NN) search plays a fundamental role in many areas including machine learning, information retrieval, computer vision and so on. In many real applications, there is no need to return exact nearest neighbors for every given query and approximate nearest neighbor (ANN) is enough to achieve satisfactory performance [1]–[3]. Because ANN search might be much faster than exact NN search, ANN search has become an active research topic with a wide range of applications especially for large-scale problems [1]–[3]. Among existing ANN search methods, hashing methods have attracted much more attention due to their storage and retrieval efficiency in real applications [4]–[23]. The goal of hashing is to embed data points from the original space into a Hamming space where the similarity is preserved. More specifically, in the Hamming space, each data point will be represented as a binary code. Based on the binary code Manuscript received July 20, 2018; revised December 30, 2018; accepted January 22, 2019. This work was supported in part by the NSFC-NRF Joint Research Project under Grant 61861146001, and in part by the NSFC under Grant 61472182. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Xiaochun Cao. (Cor￾responding author: Wu-Jun Li) All authors are with the National Key Laboratory for Novel Software Technology, Collaborative Innovation Center of Novel Software Technology and Industrialization, Department of Computer Science and Technology, Nan￾jing University, Nanjing 210023, China (E-mail: jiangqy@lamda.nju.edu.cn; liwujun@nju.edu.cn). representation, the storage cost can be dramatically reduced, and furthermore we can achieve constant or sub-linear search speed which is much faster than the search speed in the original space [12], [13], [24], [25]. Early hashing methods are mainly proposed for uni-modal data to perform uni-modal similarity search. In recent years, with the explosive growing of multimedia data in real ap￾plications, multi-modal similarity search has attracted a lot of attention. For example, given a text query, a multi-modal similarity search system can return the nearest images or videos in the database. To achieve an efficient performance for large-scale problems, multi-modal hashing (MMH) has been proposed for multi-modal search [26]–[28]. Existing MMH methods can be divided into two major categories: multi-source hashing (MSH) [15], [29]–[31] and cross-modal hashing (CMH) [26], [27], [32]–[34]. MSH meth￾ods aim to learn binary hash codes by utilizing information from multiple modalities for each point. In other words, all these multiple modalities should be observed for all data points including the query points and those in database under MSH settings. Because it’s usually difficult to observe all the modalities in many real applications, the application scenarios for MSH methods are limited. Unlike MSH methods, CMH methods usually require only one modality for a query point to perform search in a database with other modalities. The application scenarios for CMH are more flexible than those for MSH. For example, CMH can perform text-to-image or image-to-text retrieval tasks in real applications. Hence, CMH has gained more attention than MSH [26], [27]. A lot of CMH methods have been proposed in recent years. Based on whether supervised information is used or not during training procedure, existing CMH methods can be further divided into two categories: unsupervised CMH and supervised CMH. Unsupervised CMH methods directly explore data features without supervised informa￾tion to learn binary codes (or hash functions). Represen￾tative unsupervised CMH methods include canonical cor￾relation analysis iterative quantization (CCA-ITQ) [6], col￾lective matrix factorization hashing (CMFH) [27], alternat￾ing co-quantization (ACQ) [35] and unsupervised generative adversarial cross-modal hashing (UGACH) [36]. Supervised CMH tries to learn the hash function by utilizing supervised information. As supervised CMH methods can incorporate semantic labels to mitigate the semantic gap, supervised CMH methods can achieve better accuracy than unsupervised CMH methods. Representative supervised CMH methods include multi-modal latent binary embedding (MLBE) [37], semantic correlation maximization (SCM) [26], semantics preserving
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有