正在加载图片...
Deep Cross-Modal Hashing Qing-Yuan Jiang and Wu-Jun Li National Key Laboratory for Novel Software Technology Collaborative Innovation Center of Novel Software Technology and Industrialization Department of Computer Science and Technology,Nanjing University,P.R.China jiangqyalamda.nju.edu.cn,liwujun@nju.edu.cn Abstract text information like tags for the images in Flickr and many other social websites.This kind of data is always called Due to its low storage cost and fast query speed,cross- multi-modal data.With the rapid growth of multi-modal modal hashing(CMH)has been widely used for similarity data in real applications,especially multimedia application- search in multimedia retrieval applications.However most s,multi-modal hashing (MMH)has recently been widely existing CMH methods are based on hand-crafted features used for ANN search (retrieval)on multi-modal datasets. which might not be optimally compatible with the hash-code Existing MMH methods can be divided into two main learning procedure.As a result,existing CMH methods categories:mutli-source hashing (MSH)[30,36,32,14] with hand-crafted features may not achieve satisfactory and cross-modal hashing (CMH)[18,35.7.22.31.The performance.In this paper,we propose a novel CMH goal of MSH is to learn hash codes by utilizing all the in- method,called deep cross-modal hashing (DCMH),by formation from multiple modalities.Hence,MSH requires integrating feature learning and hash-code learning into that all the modalities should be observed for all data points the same framework.DCMH is an end-to-end learning including query points and those in database.In practice, framework with deep neural networks,one for each modal- the application of MSH is limited because in many cases it ity,to perform feature learning from scratch.Experiments is difficult to acquire all modalities of all data points.On on three real datasets with image-text modalities show the contrary,the application scenarios of CMH are more that DCMH can outperform other baselines to achieve flexible than those of MSH.In CMH.the modality of a the state-of-the-art performance in cross-modal retrieval query point is different from the modality of the points in applications. database.Furthermore,typically the query point has only one modality and the points in the database can have one or more modalities.For example,we can use text queries to 1.Introduction retrieve images in the database,and we can also use image Approximate nearest neighbor(ANN)search [1]plays a queries to retrieve texts in the database.Due to its wide fundamental role in machine learning and related applica- application,CMH has gained more attention than MSH. tions like information retrieval.Due to its low storage cost Many CMH methods have recently been proposed. and fast retrieval speed,hashing has recently attracted much Existing representative methods include cross attention from the ANN research community [17,34,9,15, modality similarity sensitive hashing (CMSSH)[2], 26,10,29,21,28,4].The goal of hashing is to map the cross view hashing (CVH)[18], multi-modal data points from the original space into a Hamming space latent binary embedding (MLBE)[39].co- of binary codes where the similarity in the original space regularized hashing (CRH)[38].semantic correlation is preserved in the Hamming space.By using binary hash maximization (SCM)[35].collective matrix factorization codes to represent the original data,the storage cost can hashing (CMFH)[7],semantic topic multi-modal be dramatically reduced.Furthermore,we can achieve a hashing (STMH)[33]and semantics preserving constant or sub-linear time complexity for search by using hashing (SePH)[22].Almost all these existing CMH hash codes to construct an index [15].Hence,hashing has methods are based on hand-crafted features. One become more and more popular for ANN search in large- shortcoming of these hand-crafted feature based methods scale datasets. is that the feature extraction procedure is independent of In many applications,the data can have multi-modalities. the hash-code learning procedure,which means that the For example,besides the image content,there also exists hand-crafted features might not be optimally compatibleDeep Cross-Modal Hashing Qing-Yuan Jiang and Wu-Jun Li National Key Laboratory for Novel Software Technology Collaborative Innovation Center of Novel Software Technology and Industrialization Department of Computer Science and Technology, Nanjing University, P. R. China jiangqy@lamda.nju.edu.cn, liwujun@nju.edu.cn Abstract Due to its low storage cost and fast query speed, cross￾modal hashing (CMH) has been widely used for similarity search in multimedia retrieval applications. However, most existing CMH methods are based on hand-crafted features which might not be optimally compatible with the hash-code learning procedure. As a result, existing CMH methods with hand-crafted features may not achieve satisfactory performance. In this paper, we propose a novel CMH method, called deep cross-modal hashing (DCMH), by integrating feature learning and hash-code learning into the same framework. DCMH is an end-to-end learning framework with deep neural networks, one for each modal￾ity, to perform feature learning from scratch. Experiments on three real datasets with image-text modalities show that DCMH can outperform other baselines to achieve the state-of-the-art performance in cross-modal retrieval applications. 1. Introduction Approximate nearest neighbor (ANN) search [1] plays a fundamental role in machine learning and related applica￾tions like information retrieval. Due to its low storage cost and fast retrieval speed, hashing has recently attracted much attention from the ANN research community [17, 34, 9, 15, 26, 10, 29, 21, 28, 4]. The goal of hashing is to map the data points from the original space into a Hamming space of binary codes where the similarity in the original space is preserved in the Hamming space. By using binary hash codes to represent the original data, the storage cost can be dramatically reduced. Furthermore, we can achieve a constant or sub-linear time complexity for search by using hash codes to construct an index [15]. Hence, hashing has become more and more popular for ANN search in large￾scale datasets. In many applications, the data can have multi-modalities. For example, besides the image content, there also exists text information like tags for the images in Flickr and many other social websites. This kind of data is always called multi-modal data. With the rapid growth of multi-modal data in real applications, especially multimedia application￾s, multi-modal hashing (MMH) has recently been widely used for ANN search (retrieval) on multi-modal datasets. Existing MMH methods can be divided into two main categories: mutli-source hashing (MSH) [30, 36, 32, 14] and cross-modal hashing (CMH) [18, 35, 7, 22, 3]. The goal of MSH is to learn hash codes by utilizing all the in￾formation from multiple modalities. Hence, MSH requires that all the modalities should be observed for all data points including query points and those in database. In practice, the application of MSH is limited because in many cases it is difficult to acquire all modalities of all data points. On the contrary, the application scenarios of CMH are more flexible than those of MSH. In CMH, the modality of a query point is different from the modality of the points in database. Furthermore, typically the query point has only one modality and the points in the database can have one or more modalities. For example, we can use text queries to retrieve images in the database, and we can also use image queries to retrieve texts in the database. Due to its wide application, CMH has gained more attention than MSH. Many CMH methods have recently been proposed. Existing representative methods include cross modality similarity sensitive hashing (CMSSH) [2], cross view hashing (CVH) [18], multi-modal latent binary embedding (MLBE) [39], co￾regularized hashing (CRH) [38], semantic correlation maximization (SCM) [35], collective matrix factorization hashing (CMFH) [7], semantic topic multi-modal hashing (STMH) [33] and semantics preserving hashing (SePH) [22]. Almost all these existing CMH methods are based on hand-crafted features. One shortcoming of these hand-crafted feature based methods is that the feature extraction procedure is independent of the hash-code learning procedure, which means that the hand-crafted features might not be optimally compatible
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有