5996 IEEE TRANSACTIONS ON IMAGE PROCESSING,VOL.27.NO.12,DECEMBER 2018 Deep Discrete Supervised Hashing Qing-Yuan Jiang,Xue Cui,and Wu-Jun Li,Member;IEEE Abstract-Hashing has been widely used for large-scale search Over the last decades,hashing has become an active due to its low storage cost and fast query speed.By using super- sub-area of ANN search [5],[10].[11].The goal of hashing vised information,supervised hashing can significantly outper- is to map the data points to binary codes with hash functions form unsupervised hashing.Recently,discrete supervised hashing and feature learning based deep hashing are two representative which tries to preserve the similarity in the original space progresses in supervised hashing.On one hand,hashing is essen- of the data points.With the binary hash code representation, tially a discrete optimization problem.Hence,utilizing supervised the storage cost for the data points can be dramatically information to directly guide discrete (binary)coding procedure reduced.Furthermore,hashing based ANN search is able to can avoid sub-optimal solution and improve the accuracy.On the achieve a constant or sub-linear time complexity [12].Hence. other hand,feature learning based deep hashing,which integrates deep feature learning and hash-code learning into an end-to-end hashing has become a promising choice for efficient ANN architecture,can enhance the feedback between feature learning search in large-scale datasets because of its low storage cost and hash-code learning.The key in discrete supervised hashing and fast query speed [2]-[4].[6].[12]-[18]. is to adopt supervised information to directly guide the discrete Existing hashing methods can be divided into two main cat- coding procedure in hashing.The key in deep hashing is to adopt egories:data-independent methods and data-dependent meth- the supervised information to directly guide the deep feature learning procedure.However,most deep supervised hashing ods.Data-independent hashing methods usually adopt random methods cannot use the supervised information to directly guide projections as hash functions to map the data points from both discrete(binary)coding procedure and deep feature learning the original space into a Hamming space of binary codes. procedure in the same framework.In this paper,we propose That is to say,these methods do not use any training data a novel deep hashing method,called deep discrete supervised to learn hash functions and binary codes.Representative hashing (DDSH).DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both data-independent hashing methods include locality-sensitive discrete coding procedure and deep feature learning procedure hashing (LSH)[2].[19],kernelized locality-sensitive hash- and thus enhance the feedback between these two important ing (KLSH)[14].Typically,data-independent hashing methods procedures.Experiments on four real datasets show that DDSH need long binary code to achieve satisfactory retrieval per- can outperform other state-of-the-art baselines,including both formance.Data-dependent hashing methods,which are also discrete hashing and deep hashing baselines,for image retrieval. called learning to hash methods,try to learn the hash functions Index Terms-Image retrieval,deep learning,deep supervised from data.Recent works [12],[16],[20]-[25]have shown hashing. that data-dependent methods can achieve comparable or even I.INTRODUCTION better accuracy with shorter binary hash codes,compared with UE to the explosive increasing of data in real applica- data-independent methods.Therefore,data-dependent methods tions,nearest neighbor (NN)[1]search plays a funda- have received more and more attention. mental role in many areas including image retrieval,computer Existing data-dependent hashing methods can be further vision,machine learning and data mining.In many real divided into unsupervised hashing methods and supervised applications,there is no need to return the exact nearest neigh- hashing methods,based on whether supervised information bors for every given query and approximate nearest neigh- is used for learning or not.Unsupervised hashing methods bor(ANN)is enough to achieve satisfactory search(retrieval) aim to preserve the metric (Euclidean neighbor)structure performance.Hence,ANN search has attracted much attention among the training data.Representative unsupervised hashing in recent years [2]-9]. methods include spectral hashing(SH)[26],iterative quanti- zation (ITQ)[20],isotropic hashing (IsoHash)[10],spherical Manuscript received July 31,2017:revised April 5.2018 and May 30. hashing (SPH)[16],inductive manifold hashing (IMH)[211. 2018;accepted July 17.2018.Date of publication August 10,2018;date of current version September 4,2018.This work was supported in part by the anchor graph hashing (AGH)[27],discrete graph hash- NSFC under Grant 61472182,in part by the Key Research and Development ing (DGH)[28],latent semantic minimal hashing (LSMH)[29] Program of Jiangsu under Grant BE2016121,and in part by a fund from Tencent.The associate editor coordinating the review of this manuscript and and global hashing system (GHS)[30].Due to the seman- approving it for publication was Prof.Gang Hua.(Corresponding author: tic gap [31],unsupervised hashing methods usually can not Wu-Jun Li.) achieve satisfactory retrieval performance in real applications. The authors are with the National Key Laboratory for Novel Software Technology.Collaborative Innovation Center of Novel Software Technology Unlike unsupervised hashing methods,supervised hashing and Industrialization,Department of Computer Science and Technology,Nan- methods aim to embed the data points from the original space jing University,Nanjing 210023,China (e-mail:jiangqy@lamda.nju.edu.cn: into the Hamming space by utilizing supervised information cuix @lamda.nju.edu.cn:liwujun@nju.edu.cn). to facilitate hash function learning or hash-code learning. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Representative supervised hashing methods include seman- Digital Object Identifier 10.1109/TIP.2018.2864894 tic hashing [32],self-taught hashing (STH)[3].supervised 1057-71492018 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information
5996 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2018 Deep Discrete Supervised Hashing Qing-Yuan Jiang, Xue Cui, and Wu-Jun Li , Member, IEEE Abstract— Hashing has been widely used for large-scale search due to its low storage cost and fast query speed. By using supervised information, supervised hashing can significantly outperform unsupervised hashing. Recently, discrete supervised hashing and feature learning based deep hashing are two representative progresses in supervised hashing. On one hand, hashing is essentially a discrete optimization problem. Hence, utilizing supervised information to directly guide discrete (binary) coding procedure can avoid sub-optimal solution and improve the accuracy. On the other hand, feature learning based deep hashing, which integrates deep feature learning and hash-code learning into an end-to-end architecture, can enhance the feedback between feature learning and hash-code learning. The key in discrete supervised hashing is to adopt supervised information to directly guide the discrete coding procedure in hashing. The key in deep hashing is to adopt the supervised information to directly guide the deep feature learning procedure. However, most deep supervised hashing methods cannot use the supervised information to directly guide both discrete (binary) coding procedure and deep feature learning procedure in the same framework. In this paper, we propose a novel deep hashing method, called deep discrete supervised hashing (DDSH). DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both discrete coding procedure and deep feature learning procedure and thus enhance the feedback between these two important procedures. Experiments on four real datasets show that DDSH can outperform other state-of-the-art baselines, including both discrete hashing and deep hashing baselines, for image retrieval. Index Terms— Image retrieval, deep learning, deep supervised hashing. I. INTRODUCTION DUE to the explosive increasing of data in real applications, nearest neighbor (NN) [1] search plays a fundamental role in many areas including image retrieval, computer vision, machine learning and data mining. In many real applications, there is no need to return the exact nearest neighbors for every given query and approximate nearest neighbor (ANN) is enough to achieve satisfactory search (retrieval) performance. Hence, ANN search has attracted much attention in recent years [2]–[9]. Manuscript received July 31, 2017; revised April 5, 2018 and May 30, 2018; accepted July 17, 2018. Date of publication August 10, 2018; date of current version September 4, 2018. This work was supported in part by the NSFC under Grant 61472182, in part by the Key Research and Development Program of Jiangsu under Grant BE2016121, and in part by a fund from Tencent. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Gang Hua. (Corresponding author: Wu-Jun Li.) The authors are with the National Key Laboratory for Novel Software Technology, Collaborative Innovation Center of Novel Software Technology and Industrialization, Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China (e-mail: jiangqy@lamda.nju.edu.cn; cuix@lamda.nju.edu.cn; liwujun@nju.edu.cn). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2018.2864894 Over the last decades, hashing has become an active sub-area of ANN search [5], [10], [11]. The goal of hashing is to map the data points to binary codes with hash functions which tries to preserve the similarity in the original space of the data points. With the binary hash code representation, the storage cost for the data points can be dramatically reduced. Furthermore, hashing based ANN search is able to achieve a constant or sub-linear time complexity [12]. Hence, hashing has become a promising choice for efficient ANN search in large-scale datasets because of its low storage cost and fast query speed [2]–[4], [6], [12]–[18]. Existing hashing methods can be divided into two main categories: data-independent methods and data-dependent methods. Data-independent hashing methods usually adopt random projections as hash functions to map the data points from the original space into a Hamming space of binary codes. That is to say, these methods do not use any training data to learn hash functions and binary codes. Representative data-independent hashing methods include locality-sensitive hashing (LSH) [2], [19], kernelized locality-sensitive hashing (KLSH) [14]. Typically, data-independent hashing methods need long binary code to achieve satisfactory retrieval performance. Data-dependent hashing methods, which are also called learning to hash methods, try to learn the hash functions from data. Recent works [12], [16], [20]–[25] have shown that data-dependent methods can achieve comparable or even better accuracy with shorter binary hash codes, compared with data-independent methods. Therefore, data-dependent methods have received more and more attention. Existing data-dependent hashing methods can be further divided into unsupervised hashing methods and supervised hashing methods, based on whether supervised information is used for learning or not. Unsupervised hashing methods aim to preserve the metric (Euclidean neighbor) structure among the training data. Representative unsupervised hashing methods include spectral hashing (SH) [26], iterative quantization (ITQ) [20], isotropic hashing (IsoHash) [10], spherical hashing (SPH) [16], inductive manifold hashing (IMH) [21], anchor graph hashing (AGH) [27], discrete graph hashing (DGH) [28], latent semantic minimal hashing (LSMH) [29] and global hashing system (GHS) [30]. Due to the semantic gap [31], unsupervised hashing methods usually can not achieve satisfactory retrieval performance in real applications. Unlike unsupervised hashing methods, supervised hashing methods aim to embed the data points from the original space into the Hamming space by utilizing supervised information to facilitate hash function learning or hash-code learning. Representative supervised hashing methods include semantic hashing [32], self-taught hashing (STH) [3], supervised 1057-7149 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information
JIANG et al.:DEEP DISCRETE SUPERVISED HASHING 5997 hashing with kernels (KSH)[12].latent factor hash- TABLE I ing (LFH)[33],fast supervised hashing (FastH)[23],super- NOTATION vised discrete hashing (SDH)[25],column sampling based Notation discrete supervised hashing (COSDISH)[34]and nonlinear Meaning B boldface uppercase.matrix discrete hashing(NDH)[17].By using supervised information b boldface lowercase.vector for learning,supervised hashing can significantly outperform Bii the ()th element in matrix B unsupervised hashing in real applications such as image B7 transpose of matrix B b2 Euclidean norm of vector b retrieval.Detailed surveys on learning to hash can be found 2 capital Greek letter,set of indices in[7]-9]. Hadamard product (i.e.,element-wise product) Hashing is essentially a discrete optimization problem. b2 element-wise square of vector,i.e.,b2=b.b Because it is difficult to directly solve the discrete optimization problem,early hashing methods [15],[20],[22]adopt relax- ation strategies to solve a proximate continuous problem which hashing is to adopt the supervised information to directly guide might result in a sub-optimal solution.Specifically.relaxation the deep feature learning procedure. based hashing methods utilize supervised information to guide Both discrete supervised hashing and feature learning based continuous hash code learning at the first stage.Then they deep hashing have achieved promising performance in many convert continuous hash code into binary code by using round- real applications.However,most deep supervised hashing ing technology at the second stage.Recently,several discrete methods cannot use the supervised information to directly hashing methods,e.g.,DGH [28].FastH [23].SDH [25]and guide both discrete(binary)coding procedure and deep feature COSDISH [34],which try to directly learn the discrete binary learning procedure in the same framework.In this paper, hash codes,have been proposed with promising performance. we propose a novel deep hashing method,called deep discrete In particular,several discrete supervised hashing methods, supervised hashing (DDSH).The main contributions of DDSH including FastH [23].SDH [25].and COSDISH [34],have are outlined as follows: shown better performance than traditional relaxation-based DDSH is the first deep hashing method which can utilize continuous hashing methods.The key in discrete supervised pairwise supervised information to directly guide both hashing is to adopt supervised information to directly guide discrete coding procedure and deep feature learning pro- the discrete coding procedure,i.e.,the discrete binary code cedure. learning procedure. By integrating the discrete coding procedure and deep Most existing supervised hashing methods,including those feature learning procedure into the same end-to-end introduced above and some deep hashing methods [171 framework.these two important procedures can give [32],[35],are based on hand-crafted features.One major feedback to each other to make the hash codes and feature shortcoming for these methods is that they cannot perform representation more compatible. feature learning.That is,these hand-crafted features might Experiments on four real datasets show that our proposed not be optimally compatible with the hash code learn- DDSH can outperform other state-of-the-art baselines, ing procedure.Hence,besides the progress about discrete including both discrete hashing and deep hashing base- hashing,there has appeared another progress in supervised lines. hashing,which is called feature learning based deep hash- The rest of this paper is organized as follows.In Section II, ing [11].[36]-[43].Representative feature learning based we briefly introduce the notations and problem definition in deep hashing methods includes convolutional neural net- this paper.Then we describe DDSH in Section III.We discuss work hashing (CNNH)[36],network in network hash- the difference between DDSH and existing deep hashing ing (NINH)[37],deep semantic ranking hashing(DSRH)[38], methods with pairwise labels in Section IV.In Section V, deep similarity comparison hashing (DSCH)[11],deep we evaluate DDSH on four datasets by carrying out the Ham- pairwise-supervised hashing(DPSH)[42].deep hashing net- ming ranking task and hash lookup task.Finally,we conclude work (DHN)[41],deep supervised hashing (DSH)[40], the paper in Section VI. deep quantization network (DQN)[41]and deep supervised discrete hashing (DSDH)[44].Deep hashing adopts deep II.NOTATION AND PROBLEM DEFINITION learning [45].[46]for supervised hashing.In particular,most deep hashing methods adopt deep feature learning to learn a A.Notation feature representation for hashing.Many deep hashing meth- Some representative notations we use in this paper are out- ods integrate deep feature representation learning and hashing lined in Table I.More specifically,we use boldface uppercase code learning into an end-to-end architecture.Under this archi- letters like B to denote matrices.We use boldface lowercase tecture,feature learning procedure and hash-code learning pro- letters like b to denote vectors.The (i,j)th element in matrix cedure can give feedback to each other in learning procedure. B is denoted as Bij.B is the transpose of B and lbll2 is the Many works [40],[42]have shown that using the supervised Euclidean norm of vector b.We use the capital Greek letter information to directly guide the deep feature learning proce- like to denote the set of indices.We use the symbol.to dure can achieve better performance than other strategies [36] denote the Hadamard product(i.e.,element-wise product).The which do not use supervised information to directly guide square of a vector(or a matrix)like b2 indicates element-wise the deep feature learning procedure.Hence,the key in deep square,i.e,b2=b●b
JIANG et al.: DEEP DISCRETE SUPERVISED HASHING 5997 hashing with kernels (KSH) [12], latent factor hashing (LFH) [33], fast supervised hashing (FastH) [23], supervised discrete hashing (SDH) [25], column sampling based discrete supervised hashing (COSDISH) [34] and nonlinear discrete hashing (NDH) [17]. By using supervised information for learning, supervised hashing can significantly outperform unsupervised hashing in real applications such as image retrieval. Detailed surveys on learning to hash can be found in [7]–[9]. Hashing is essentially a discrete optimization problem. Because it is difficult to directly solve the discrete optimization problem, early hashing methods [15], [20], [22] adopt relaxation strategies to solve a proximate continuous problem which might result in a sub-optimal solution. Specifically, relaxation based hashing methods utilize supervised information to guide continuous hash code learning at the first stage. Then they convert continuous hash code into binary code by using rounding technology at the second stage. Recently, several discrete hashing methods, e.g., DGH [28], FastH [23], SDH [25] and COSDISH [34], which try to directly learn the discrete binary hash codes, have been proposed with promising performance. In particular, several discrete supervised hashing methods, including FastH [23], SDH [25], and COSDISH [34], have shown better performance than traditional relaxation-based continuous hashing methods. The key in discrete supervised hashing is to adopt supervised information to directly guide the discrete coding procedure, i.e., the discrete binary code learning procedure. Most existing supervised hashing methods, including those introduced above and some deep hashing methods [17], [32], [35], are based on hand-crafted features. One major shortcoming for these methods is that they cannot perform feature learning. That is, these hand-crafted features might not be optimally compatible with the hash code learning procedure. Hence, besides the progress about discrete hashing, there has appeared another progress in supervised hashing, which is called feature learning based deep hashing [11], [36]–[43]. Representative feature learning based deep hashing methods includes convolutional neural network hashing (CNNH) [36], network in network hashing (NINH) [37], deep semantic ranking hashing (DSRH) [38], deep similarity comparison hashing (DSCH) [11], deep pairwise-supervised hashing (DPSH) [42], deep hashing network (DHN) [41], deep supervised hashing (DSH) [40], deep quantization network (DQN) [41] and deep supervised discrete hashing (DSDH) [44]. Deep hashing adopts deep learning [45], [46] for supervised hashing. In particular, most deep hashing methods adopt deep feature learning to learn a feature representation for hashing. Many deep hashing methods integrate deep feature representation learning and hashing code learning into an end-to-end architecture. Under this architecture, feature learning procedure and hash-code learning procedure can give feedback to each other in learning procedure. Many works [40], [42] have shown that using the supervised information to directly guide the deep feature learning procedure can achieve better performance than other strategies [36] which do not use supervised information to directly guide the deep feature learning procedure. Hence, the key in deep TABLE I NOTATION hashing is to adopt the supervised information to directly guide the deep feature learning procedure. Both discrete supervised hashing and feature learning based deep hashing have achieved promising performance in many real applications. However, most deep supervised hashing methods cannot use the supervised information to directly guide both discrete (binary) coding procedure and deep feature learning procedure in the same framework. In this paper, we propose a novel deep hashing method, called deep discrete supervised hashing (DDSH). The main contributions of DDSH are outlined as follows: • DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both discrete coding procedure and deep feature learning procedure. • By integrating the discrete coding procedure and deep feature learning procedure into the same end-to-end framework, these two important procedures can give feedback to each other to make the hash codes and feature representation more compatible. • Experiments on four real datasets show that our proposed DDSH can outperform other state-of-the-art baselines, including both discrete hashing and deep hashing baselines. The rest of this paper is organized as follows. In Section II, we briefly introduce the notations and problem definition in this paper. Then we describe DDSH in Section III. We discuss the difference between DDSH and existing deep hashing methods with pairwise labels in Section IV. In Section V, we evaluate DDSH on four datasets by carrying out the Hamming ranking task and hash lookup task. Finally, we conclude the paper in Section VI. II. NOTATION AND PROBLEM DEFINITION A. Notation Some representative notations we use in this paper are outlined in Table I. More specifically, we use boldface uppercase letters like B to denote matrices. We use boldface lowercase letters like b to denote vectors. The (i, j)th element in matrix B is denoted as Bij . BT is the transpose of B and b2 is the Euclidean norm of vector b. We use the capital Greek letter like to denote the set of indices. We use the symbol • to denote the Hadamard product (i.e., element-wise product). The square of a vector (or a matrix) like b2 indicates element-wise square, i.e., b2 = b • b.
5998 IEEE TRANSACTIONS ON IMAGE PROCESSING,VOL.27.NO.12,DECEMBER 2018 Feature Learning Part Loss Part ! Pariwise Loss 心么 B Optimal Binary Code Learning Fig.1. The model architecture of DDSH.DDSH is an end-to-end deep learning framework which consists of two main components:loss part and feature leaming part.The loss part contains the discrete coding procedure (to learn the binary codes B),and the feature leaming part contains the deep feature learning procedure (to lear the F(x:for x indexed by I).During each iteration,we adopt an alternating strategy to learn binary codes and feature representation alteratively,both of which are directly guided by supervised information.Hence,DDSH can enhance the feedback between the discrete coding procedure and the deep feature learning procedure. B.Problem Definition There have appeared various loss functions in supervised Although supervised information can also be triplet hashing.For example,KSH [12]uses L2 function,which is labels [11],[37]-[39]or pointwise labels [25].in this paper we defined as follows: only focus on the setting with pairwise labels [36].[40]-[43] which is a popular setting in supervised hashing.The tech- L(bi,bj:Sij)=(Si b)2 nique in this paper can also be adapted to settings with triplet labels,which will be pursued in our future work. where b"is the mth element in vector bi.Please note that our We use X =(xi)to denote a set of training points DDSH is able to adopt many different kinds of loss functions. In supervised hashing with pairwise labels,the supervised information S =(-1,1]xn between data points is also In this paper,we only use L2 loss function as an example,and leave other loss functions for further study in future work. available for training procedure,where Sij is defined as follows: xi and xj are similar. III.DEEP DISCRETE SUPERVISED HASHING -1,otherwise. In this section,we describe the details of DDSH,including Supervised hashing aims at learning a hash function to map the model architecture and learning algorithm. the data points from the original space into the binary code space (or called Hamming space),with the semantic (super- A.Model Architecture vised)similarity in S preserved in the binary code space DDSH is an end-to-end deep hashing method which is able We define the hash function as:h(x)(-1,+1),Vx EX, to simultaneously perform feature learning and hash code where c is the binary code length.The Hamming distance learning in the same framework.The model architecture of between binary codes bi=h(xi)and bj=h(xj)is defined DDSH is shown in Figure 1,which contains two important as follows: 1 components:loss part and feature learning part.The loss part distH (bi,bj)=(c-b bj) contains the discrete coding procedure which aims to learn optimal binary code to preserve semantic pairwise similarity. To preserve the similarity between data points,the Hamming The feature learning part contains the deep feature learning distance between the binary codes bi=h(xi)and bj =h(xj) procedure which tries to learn a compatible deep neural net- should be relatively small if the data points xi and xj are work to extract deep representation from scratch.For DDSH, similar,i.e.,Sij=1.On the contrary,the Hamming distance discrete coding procedure and deep feature learning are inte- between the binary codes bi=h(xi)and bj=h(xj)should grated into an end-to-end framework.More importantly,both be relatively large if the data points xi and xi are dissimilar, procedures are directly guided by supervised information. i.e.,Sij =-1.In other words,the goal of supervised hashing 1)Loss Part:Inspired by COSDISH [34],we use is to solve the following problem: column-sampling method to split the whole training set into min C(h)=∑Lh(x),h(x片S) two parts.More specifically,we randomly sample a subset ofΦ={l,2,,n}and generate I=Φ\2(Tl>l2. i.j=l Then we split the whole training set X into two subsets X =∑Lb,b;S), and Xr,where X and Xr denote the training data points (1) indexed by and I respectively. i,j=1 Similarly,we sample |Q columns of S with the correspond- where L()is a loss function. ing sampled columns indexed by Then,we approximate the
5998 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2018 Fig. 1. The model architecture of DDSH. DDSH is an end-to-end deep learning framework which consists of two main components: loss part and feature learning part. The loss part contains the discrete coding procedure (to learn the binary codes B), and the feature learning part contains the deep feature learning procedure (to learn the F(x; ) for x indexed by ). During each iteration, we adopt an alternating strategy to learn binary codes and feature representation alternatively, both of which are directly guided by supervised information. Hence, DDSH can enhance the feedback between the discrete coding procedure and the deep feature learning procedure. B. Problem Definition Although supervised information can also be triplet labels [11], [37]–[39] or pointwise labels [25], in this paper we only focus on the setting with pairwise labels [36], [40]–[43] which is a popular setting in supervised hashing. The technique in this paper can also be adapted to settings with triplet labels, which will be pursued in our future work. We use X = {xi}n i=1 to denote a set of training points. In supervised hashing with pairwise labels, the supervised information S = {−1, 1}n×n between data points is also available for training procedure, where Sij is defined as follows: Sij = 1, xi and x j are similar. −1, otherwise. Supervised hashing aims at learning a hash function to map the data points from the original space into the binary code space (or called Hamming space), with the semantic (supervised) similarity in S preserved in the binary code space. We define the hash function as: h(x) ∈ {−1, +1} c , ∀x ∈ X, where c is the binary code length. The Hamming distance between binary codes bi = h(xi) and bj = h(x j) is defined as follows: distH (bi, bj) = 1 2 (c − bT i bj). To preserve the similarity between data points, the Hamming distance between the binary codes bi = h(xi) and bj = h(x j) should be relatively small if the data points xi and x j are similar, i.e., Sij = 1. On the contrary, the Hamming distance between the binary codes bi = h(xi) and bj = h(x j) should be relatively large if the data points xi and x j are dissimilar, i.e., Sij = −1. In other words, the goal of supervised hashing is to solve the following problem: min h L(h) = n i,j=1 L(h(xi), h(x j); Sij) = n i,j=1 L(bi, bj; Sij), (1) where L(·) is a loss function. There have appeared various loss functions in supervised hashing. For example, KSH [12] uses L2 function, which is defined as follows: L(bi, bj; Sij) = Sij − 1 c c m=1 bm i bm j 2 , where bm i is the mth element in vector bi . Please note that our DDSH is able to adopt many different kinds of loss functions. In this paper, we only use L2 loss function as an example, and leave other loss functions for further study in future work. III. DEEP DISCRETE SUPERVISED HASHING In this section, we describe the details of DDSH, including the model architecture and learning algorithm. A. Model Architecture DDSH is an end-to-end deep hashing method which is able to simultaneously perform feature learning and hash code learning in the same framework. The model architecture of DDSH is shown in Figure 1, which contains two important components: loss part and feature learning part. The loss part contains the discrete coding procedure which aims to learn optimal binary code to preserve semantic pairwise similarity. The feature learning part contains the deep feature learning procedure which tries to learn a compatible deep neural network to extract deep representation from scratch. For DDSH, discrete coding procedure and deep feature learning are integrated into an end-to-end framework. More importantly, both procedures are directly guided by supervised information. 1) Loss Part: Inspired by COSDISH [34], we use column-sampling method to split the whole training set into two parts. More specifically, we randomly sample a subset of = {1, 2,..., n} and generate = \ (||||). Then we split the whole training set X into two subsets X and X, where X and X denote the training data points indexed by and respectively. Similarly, we sample || columns of S with the corresponding sampled columns indexed by . Then, we approximate the
JIANG et al.:DEEP DISCRETE SUPERVISED HASHING 5999 original problem in (1)by only using the sampled columns of TABLE II S: CONFIGURATION OF THE CONVOLUTIONAL LAYERS IN DDSH Configuration min C(h)= ∑∑Lhx,h(xs) Layer filter size stride pad LRN pool ien j=l 64×11×11 4×4 0 yes 2×2 conv2 256×5×5 yes 2×2 L(h(xi),h(xj);Sij) conv3 256×3×3 1×1 no XiEXOXjEXT conv4 256×3×3 1×1 no conv5 256×3×3 1×1 0 2×2 + ∑Lhs,hss 2 X,x∈X0 TABLE III CONFIGURATION OF THE FULLY-CONNECTED LAYERS IN DDSH Then we introduce auxiliary variables to solve problem(2) More specifically,we utilize auxiliary variables B=(bili E Layer Configuration with bie(-1,+1e to replace part of the binary codes full6 4096 generated by the hash function,i.e.,h(X).Here,h(X)= full7 4096 full8 (h(xi)x;X).Then we rewrite the problem(2)as follows: hash code length c min C(h,B9=∑∑Lb,hs片S) h BO 2)Feature Learning Part:The binary codes of X are iEO xiEX generated by the hash function h(),which is defined based on +∑Lb,bj:S) the output of the deep feature learning part.More specifically, i.jen we define our hash function as:h(x)=sign(F(x;))where s.t.b:e{-1,+1},i∈2 (3) sign()is the element-wise sign function.F(x:denotes the output of the feature learning part and denotes all The problem in (3)is the final loss function (objective)to parameters of the deep neural network. learn by DDSH.We can find that the whole training set is We adopt a convolutional neural network(CNN)from [47]. divided into two subsets X and X.The binary codes of i.e.,CNN-F,as our deep feature learning part.We replace X i.e..BR,are directly learned from the objective function the last layer of CNN-F as one fully-connected layer to in(3).but the binary codes of xr are generated by the hash project the output of the second last layer to Re space. function h().h()is defined based on the output of the deep More specifically,the feature learning part contains 5 con- feature learning part,which will be introduced in the following volutional layers ("conv1-conv5)and 3 fully-connected lay- subsection. ers ("full6-full8").The detailed configuration of the 5 con- The learning of B contains the discrete coding procedure. volutional layers is shown in Table II.In Table II,"filter which is directly guided by the supervised information.The size"denotes the number of convolutional filters and their learning of h()contains the deep feature learning procedure, receptive field size."stride"specifies the convolutional stride. which is also directly guided by the supervised informa- "pad"indicates the number of pixels to add to each size tion.Hence,our DDSH can utilize supervised information to of the input."LRN"denotes whether Local Response Nor- directly guide both discrete coding procedure and deep feature malization (LRN)[45]is applied or not."pool"denotes learning procedure in the same end-to-end deep framework. the down-sampling factor.The detailed configuration of the This is different from existing deep hashing methods which 3 fully-connected layers is shown in Table III,where the either use relaxation strategy without discrete coding or do not "configuration"shows the number of nodes in each layer. use the supervised information to directly guide the discrete We adopt the Rectified Linear Unit (ReLU)[45]as activa- coding procedure. tion function for all the first seven layers.For the last layer, Please note that "directly guided"in this paper means that we utilize identity function as the activation function. the supervised information is directly included in the corre- sponding terms in the loss function.For example,the super- vised information Sij is directly included in all terms about the B.Learning discrete codes B in(3),which means that the discrete coding After randomly sampling at each iteration,we utilize an procedure is directly guided by the supervised information. alternating learning strategy to solve problem(3). Furthermore,the supervised information Sij is also directly More specifically,each time we learn one of the variables included in the term about the deep feature learning function B and h(F(x;)with the other fixed.When h(F(x;))is h(xj)in(3),which means that the deep feature learning pro- fixed,we directly learn the discrete hash code B over binary cedure is also directly guided by the supervised information. variables.When B is fixed,we update the parameter of To the best of our knowledge,DDSH is the first deep hashing the deep neural network. method which can utilize pairwise supervised information to 1)Learn B With h(F(x:)Fixed:When h(F(x:)is directly guide both discrete coding procedure and deep feature fixed,it's easy to transform problem(3)into a binary quadratic learning procedure,and thus enhance the feedback between programming (BQP)problem as that in TSH [22].Each time these two important procedures. we optimize one bit for all points.Then,the optimization of
JIANG et al.: DEEP DISCRETE SUPERVISED HASHING 5999 original problem in (1) by only using the sampled columns of S: min h L(h) = i∈ n j=1 L(h(xi), h(x j); Sij) = xi∈X x j∈X L(h(xi), h(x j); Sij) + xi,x j∈X L(h(xi), h(x j); Sij). (2) Then we introduce auxiliary variables to solve problem (2). More specifically, we utilize auxiliary variables B = {bi|i ∈ } with bi ∈ {−1, +1}c to replace part of the binary codes generated by the hash function, i.e., h(X). Here, h(X) = {h(xi)|xi ∈ X}. Then we rewrite the problem (2) as follows: min h,B L(h,B) = i∈ x j∈X L(bi, h(x j); Sij) + i,j∈ L(bi, bj; Sij) s.t. bi ∈ {−1, +1} c , ∀i ∈ (3) The problem in (3) is the final loss function (objective) to learn by DDSH. We can find that the whole training set is divided into two subsets X and X. The binary codes of X, i.e., B, are directly learned from the objective function in (3), but the binary codes of X are generated by the hash function h(·). h(·) is defined based on the output of the deep feature learning part, which will be introduced in the following subsection. The learning of B contains the discrete coding procedure, which is directly guided by the supervised information. The learning of h(·) contains the deep feature learning procedure, which is also directly guided by the supervised information. Hence, our DDSH can utilize supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same end-to-end deep framework. This is different from existing deep hashing methods which either use relaxation strategy without discrete coding or do not use the supervised information to directly guide the discrete coding procedure. Please note that “directly guided” in this paper means that the supervised information is directly included in the corresponding terms in the loss function. For example, the supervised information Sij is directly included in all terms about the discrete codes B in (3), which means that the discrete coding procedure is directly guided by the supervised information. Furthermore, the supervised information Sij is also directly included in the term about the deep feature learning function h(x j) in (3), which means that the deep feature learning procedure is also directly guided by the supervised information. To the best of our knowledge, DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both discrete coding procedure and deep feature learning procedure, and thus enhance the feedback between these two important procedures. TABLE II CONFIGURATION OF THE CONVOLUTIONAL LAYERS IN DDSH TABLE III CONFIGURATION OF THE FULLY-CONNECTED LAYERS IN DDSH 2) Feature Learning Part: The binary codes of X are generated by the hash function h(·), which is defined based on the output of the deep feature learning part. More specifically, we define our hash function as: h(x) = sign(F(x; )), where sign(·) is the element-wise sign function. F(x; ) denotes the output of the feature learning part and denotes all parameters of the deep neural network. We adopt a convolutional neural network (CNN) from [47], i.e., CNN-F, as our deep feature learning part. We replace the last layer of CNN-F as one fully-connected layer to project the output of the second last layer to Rc space. More specifically, the feature learning part contains 5 convolutional layers (“conv1-conv5”) and 3 fully-connected layers (“full6-full8”). The detailed configuration of the 5 convolutional layers is shown in Table II. In Table II, “filter size” denotes the number of convolutional filters and their receptive field size. “stride” specifies the convolutional stride. “pad” indicates the number of pixels to add to each size of the input. “LRN” denotes whether Local Response Normalization (LRN) [45] is applied or not. “pool” denotes the down-sampling factor. The detailed configuration of the 3 fully-connected layers is shown in Table III, where the “configuration” shows the number of nodes in each layer. We adopt the Rectified Linear Unit (ReLU) [45] as activation function for all the first seven layers. For the last layer, we utilize identity function as the activation function. B. Learning After randomly sampling at each iteration, we utilize an alternating learning strategy to solve problem (3). More specifically, each time we learn one of the variables B and h(F(x; )) with the other fixed. When h(F(x; )) is fixed, we directly learn the discrete hash code B over binary variables. When B is fixed, we update the parameter of the deep neural network. 1) Learn B With h(F(x; )) Fixed: When h(F(x; )) is fixed, it’s easy to transform problem (3) into a binary quadratic programming (BQP) problem as that in TSH [22]. Each time we optimize one bit for all points. Then, the optimization of
6000 IEEE TRANSACTIONS ON IMAGE PROCESSING,VOL.27.NO.12,DECEMBER 2018 the kth bit of all points in B is given by: Algorithm 1 The Learning Algorithm for DDSH min [b ]TQ b+[b p Input: b Training set X; s.t.b∈{-1,+1}I (4) Code length c; Supervised information S. where b denotes the kth column of B,and Output: k- -29-∑ Parameter of the deep neural network. Initialization m=1 Initialize neural network parameter mini-batch size M Q=0 and iteration number Tout,Tin Initialize B=bii=1,2,...,n} p晴=-2∑晴es-∑, for iter =1,2,...,Tout do l=1 Randomly sample and set r= Here,bm denotes the mth bit of bi and p denotes the ith Split training set X into X and xr. element of p. Split B into B and Br. Following COSDISH,we can easily solve problem (4) for epoch =1,2,....Tin do by transforming the BQP problem into an equally clustering for k =1,2,...,c do problem [48]. Construct the BOP problem for the kth bit using (4). 2)Learn h(F(x:))With B Fixed:Because the derivative Construct the clustering problem to solve the BQP of the hash function h(x)=sign(F(x:))is 0 everywhere problem for the kth bit. end for except at 0,we cannot use back-propagation(BP)methods to for t 1,2,...,T/M do update the neural network parameters.So we relax sign()as h(x)=tanh(F(x;)inspired by Song et al.[24].Then we Randomly sample M data points from XI to con- struct a mini-batch. optimize the following problem: Calculate h(xj)for each data point x;in the mini- minc)=∑∑Ld,AsS) batch by forward propagation. i∈2 XIEXI Calculate the gradient according to(7). s.t.h(xj)=tanh(F(xj;) (5) Update the parameter by using back propagation. Update b;=sign(h(x))for each data point xj in To learn the CNN parameter we utilize a the mini-batch. back-propagation algorithm.That is,each time we sample a end for mini-batch of data points,and then use BP algorithm based end for on the sampled data. end for We define the output of CNN as zj=F(xj:and aj= tanh(zj).Then we can compute the gradient of aj and zj as follows: ac =>0Lb,:S) More specifically,given any point xX,we use the dai following formula to predict its binary code: ien dai =∑2abi-sb, (6) ba=h(x)=sign(F(x;)) ieΩ where is the deep neural network parameter learned by and DDSH model. ac ac ●(1-a) IV.COMPARISON TO RELATED WORK =∑2(ab-S)b,·1-a》 (7) Existing feature learning based deep hashing methods with iEQ pairwise labels either use relaxation strategy without discrete Then,wecan use chain rule to computebased on coding or do not use the supervised information to directly and号 guide the discrete coding procedure.For example,CNNH [36] We summarize the whole learning algorithm for DDSH in is a two-step method which adopts relaxation strategy to learn Algorithm 1. continuous code in the first stage and performs feature learning in the second stage.The feature learning procedure in CNNH is not directly guided by supervised information.NINH [37], C.Out-of-Sample Extension for Unseen Data Points DHN [41]and DSH [40]adopt relaxation strategy to learn After training our DDSH model,we can adopt the learned continuous code.DPSH [42]and DQN [41]can learn binary deep hashing framework to predict the binary code for any code in the training procedure.However,DSPH and DQN do unseen data point during training. not utilize the supervised information to directly guide the
6000 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2018 the kth bit of all points in B is given by: min bk [bk ] T Qkbk + [bk ] T pk s.t. bk ∈ {−1, +1} || (4) where bk denotes the kth column of B, and Qk ij i=j = −2(cS ij − k−1 m=1 bm i bm j ) Qk ii = 0 pk i = −2 || l=1 B lk (cS li − k−1 m=1 B lmbm i ). Here, bm i denotes the mth bit of bi and pk i denotes the ith element of pk . Following COSDISH, we can easily solve problem (4) by transforming the BQP problem into an equally clustering problem [48]. 2) Learn h(F(x; )) With B Fixed: Because the derivative of the hash function h(x) = sign(F(x; )) is 0 everywhere except at 0, we cannot use back-propagation (BP) methods to update the neural network parameters. So we relax sign(·) as h(x) = tanh(F(x; )) inspired by Song et al. [24]. Then we optimize the following problem: min h L(h) = i∈ x j∈X L(bi, h(x j); Sij) s.t. h(x j) = tanh(F(x j; )) (5) To learn the CNN parameter , we utilize a back-propagation algorithm. That is, each time we sample a mini-batch of data points, and then use BP algorithm based on the sampled data. We define the output of CNN as z j = F(x j; ) and a j = tanh(z j). Then we can compute the gradient of a j and z j as follows: ∂L ∂a j = i∈ ∂ L(bi, a j; Sij) ∂a j = i∈ 2(aT j bi − Sij)bi (6) and ∂L ∂z j = ∂L ∂a j • (1 − a2 j) = i∈ 2(aT j bi − Sij)bi • (1 − a2 j) (7) Then, we can use chain rule to compute ∂L ∂ based on ∂L ∂a j and ∂L ∂z j . We summarize the whole learning algorithm for DDSH in Algorithm 1. C. Out-of-Sample Extension for Unseen Data Points After training our DDSH model, we can adopt the learned deep hashing framework to predict the binary code for any unseen data point during training. Algorithm 1 The Learning Algorithm for DDSH More specifically, given any point xq ∈/ X, we use the following formula to predict its binary code: bq = h(xq ) = sign(F(xq ; )), where is the deep neural network parameter learned by DDSH model. IV. COMPARISON TO RELATED WORK Existing feature learning based deep hashing methods with pairwise labels either use relaxation strategy without discrete coding or do not use the supervised information to directly guide the discrete coding procedure. For example, CNNH [36] is a two-step method which adopts relaxation strategy to learn continuous code in the first stage and performs feature learning in the second stage. The feature learning procedure in CNNH is not directly guided by supervised information. NINH [37], DHN [41] and DSH [40] adopt relaxation strategy to learn continuous code. DPSH [42] and DQN [41] can learn binary code in the training procedure. However, DSPH and DQN do not utilize the supervised information to directly guide the
JIANG et al.:DEEP DISCRETE SUPERVISED HASHING 6001 discrete coding procedure.The objective function of DPSH! The SVHN dataset consists of 73,257 digits for training, can be written as: 26,032 digits for testing and 531,131 additional samples.It is a real-world image dataset for recognizing digital numbers CDsH=-∑(SOi-log(1+ej)+n∑Ib:-u2, in natural scene images.The images are categorized into SijES i=l 10 classes,each corresponding to a digital number.SVHN is where ij=ufuj and ui denotes the output of the deep also a single-label dataset.Two images are treated as similar if they share the same label.Otherwise,they are considered neural network.We can find that in DPSH the discrete coding to be dissimilar. procedure is not directly guided by supervised information, i.e.,the supervised information is not directly included in the The NUS-WIDE dataset is a large-scale image dataset which includes 269,648 images and the associated tags from terms of [bi}in the objective function.The objective function Flickr website.It's a multi-label dataset where each image of DQN can be written as: might be annotated with multi labels.We select 186,577 data cpoN=∑(S-2 2∑Iz-Ch, points that belong to the 10 most frequent concepts from the original dataset.Two images are treated as similar if they share at least one label.Otherwise,they are considered to be where zi denotes the output of the deep neural network and dissimilar. ∑=,l☑-Chil denotes the product quantization loss. The ClothinglM dataset is a relatively large-scale The discrete coding procedure is only contained in the term dataset which contains 1,037,497 images which belong to lzi-Chill.We can find that in DON the discrete cod- l4 classes including“T-shirt'",“shirt'”",“knitwear'”,“chiffon"", ing procedure is not directly guided by supervised information “sweater'”,"hoodie'”,“windbreaker'”,"jacket'",“downcoat'" either. “suit”,“shawl",“dress'”,“vest”and“underwear'”.Clothing1M There appears one deep hashing method called DSDH [44]. dataset is a single-label dataset.Two images are treated as Unlike DDSH,DSDH utilizes pointwise supervised infor- similar if they share the same label,i.e.,they belong to the mation to guide the discrete coding procedure and utilize same class.Otherwise,they are considered to be dissimilar. pairwise similarity to guide feature learning procedure.Then Table IV illustrates some example points from the above DSDH bridges discrete coding procedure and feature learning four datasets. procedure by using the method of auxiliary coordinates (MAC) For CIFAR-10 dataset,we randomly take technique in AFFHash [49].Due to the requirements of 1,000 images (100 images per class)as query set and the pointwise labels and pairwise labels,the application scenarios remaining images as retrieval set.Furthermore,we randomly of DSDH might be limited. select 5,000 images (500 images per class)from retrieval To the best of our knowledge,our DDSH is the first set as training set.For SVHN dataset,we randomly select deep hashing method which can utilize pairwise supervised 1,000 images (100 images per class)from testing set as information to directly guide both discrete coding procedure query set and utilize the whole training set as retrieval and deep feature learning procedure in the same framework. set.We randomly select 5,000 images (500 images per class)from retrieval set as training set.For NUS-WIDE V.EXPERIMENT dataset,we randomly select 1,867 data points as query We evaluate DDSH and other baselines on datasets from set and the remaining data points as retrieval set. image retrieval applications.The open source deep learning We randomly select 5,000 data points from retrieval set library MatConvNet [50]is used to implement our model.All as training set.For ClothingIM dataset,after removing experiments are performed on an NVIDIA K40 GPU server. the images whose links are invalid,we randomly select 7,000 images (500 images per class)as query set and 1,028,083 images as retrieval set.Furthermore,we randomly A.Experimental Setting sample 14.000 images(1,000 images per class)from retrieval 1)Datasets:We adopt four widely used image datasets set to construct training set. to evaluate our proposed method.They are CIFAR-102 [45], 2)Baselines and Evaluation Protocol:We compare DDSH SVHN3 [51],NUS-WIDE [52]and Clothing1M>[53]. with eleven state-of-the-art baselines,including LSH [19], The CIFAR-10 dataset contains 60,000 images which are ITQ [20],LFH [33],FastH [23],SDH [25],COSDISH [34], manually labeled into 10 classes including“airplane'”,“auto- NDH [17].DHN [43],DSH [40].DPSH [42]and DSDH [44]. mobile'”,bird,“cat”,“deer”,“dog”,“frog”,horse”,“ship These baselines are briefly introduced as follows: and"truck".It's a single-label dataset.The size of each image is 32x32 pixels.Two images are treated as similar if they share Locality-sensitive hashing (LSH)[19]:LSH is a repre- the same label,i.e.,they belong to the same class.Otherwise, sentative data-independent hashing method.LSH utilizes they are considered to be dissimilar. random projection to generate hash function. Iterative quantization (ITQ)[20]:ITQ is a representative For DPSH.supervised information Sij is defined on (0,1). 2https://www.cs.toronto.edu/~kriz/cifar.html unsupervised hashing method.ITQ first projects data 3http://ufldl.stanford.edu/housenumbers/ points into low space by utilizing principal component 4http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm analysis (PCA).Then ITQ minimizes the quantization 5https://github.com/Cysu/noisy_label error to learn binary code
JIANG et al.: DEEP DISCRETE SUPERVISED HASHING 6001 discrete coding procedure. The objective function of DPSH1 can be written as: LDPSH = − Sij∈S (Sij ij − log(1 + eij)) + η n i=1 bi − ui 2 F , where ij = 1 2uT i uj and ui denotes the output of the deep neural network. We can find that in DPSH the discrete coding procedure is not directly guided by supervised information, i.e., the supervised information is not directly included in the terms of {bi} in the objective function. The objective function of DQN can be written as: LDQN = Sij ∈S (Sij − zT i z j ziz j ) 2 + λ n i=1 zi − Chi2 F , where zi denotes the output of the deep neural network and n i=1 zi − Chi2 F denotes the product quantization loss. The discrete coding procedure is only contained in the term n i=1 zi −Chi2 F . We can find that in DQN the discrete coding procedure is not directly guided by supervised information either. There appears one deep hashing method called DSDH [44]. Unlike DDSH, DSDH utilizes pointwise supervised information to guide the discrete coding procedure and utilize pairwise similarity to guide feature learning procedure. Then DSDH bridges discrete coding procedure and feature learning procedure by using the method of auxiliary coordinates (MAC) technique in AFFHash [49]. Due to the requirements of pointwise labels and pairwise labels, the application scenarios of DSDH might be limited. To the best of our knowledge, our DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same framework. V. EXPERIMENT We evaluate DDSH and other baselines on datasets from image retrieval applications. The open source deep learning library MatConvNet [50] is used to implement our model. All experiments are performed on an NVIDIA K40 GPU server. A. Experimental Setting 1) Datasets: We adopt four widely used image datasets to evaluate our proposed method. They are CIFAR-102 [45], SVHN3 [51], NUS-WIDE4 [52] and Clothing1M5 [53]. The CIFAR-10 dataset contains 60,000 images which are manually labeled into 10 classes including “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”. It’s a single-label dataset. The size of each image is 32×32 pixels. Two images are treated as similar if they share the same label, i.e., they belong to the same class. Otherwise, they are considered to be dissimilar. 1For DPSH, supervised information Si j is defined on {0, 1}. 2https://www.cs.toronto.edu/∼ kriz/cifar.html 3http://ufldl.stanford.edu/housenumbers/ 4http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm 5https://github.com/Cysu/noisy_label The SVHN dataset consists of 73,257 digits for training, 26,032 digits for testing and 531,131 additional samples. It is a real-world image dataset for recognizing digital numbers in natural scene images. The images are categorized into 10 classes, each corresponding to a digital number. SVHN is also a single-label dataset. Two images are treated as similar if they share the same label. Otherwise, they are considered to be dissimilar. The NUS-WIDE dataset is a large-scale image dataset which includes 269,648 images and the associated tags from Flickr website. It’s a multi-label dataset where each image might be annotated with multi labels. We select 186,577 data points that belong to the 10 most frequent concepts from the original dataset. Two images are treated as similar if they share at least one label. Otherwise, they are considered to be dissimilar. The Clothing1M dataset is a relatively large-scale dataset which contains 1,037,497 images which belong to 14 classes including “T-shirt”, “shirt”, “knitwear”, “chiffon”, “sweater”, “hoodie”, “windbreaker”, “jacket”, “downcoat”, “suit”, “shawl”, “dress”, “vest” and “underwear”. Clothing1M dataset is a single-label dataset. Two images are treated as similar if they share the same label, i.e., they belong to the same class. Otherwise, they are considered to be dissimilar. Table IV illustrates some example points from the above four datasets. For CIFAR-10 dataset, we randomly take 1,000 images (100 images per class) as query set and the remaining images as retrieval set. Furthermore, we randomly select 5,000 images (500 images per class) from retrieval set as training set. For SVHN dataset, we randomly select 1,000 images (100 images per class) from testing set as query set and utilize the whole training set as retrieval set. We randomly select 5,000 images (500 images per class) from retrieval set as training set. For NUS-WIDE dataset, we randomly select 1,867 data points as query set and the remaining data points as retrieval set. We randomly select 5,000 data points from retrieval set as training set. For Clothing1M dataset, after removing the images whose links are invalid, we randomly select 7,000 images (500 images per class) as query set and 1,028,083 images as retrieval set. Furthermore, we randomly sample 14,000 images (1,000 images per class) from retrieval set to construct training set. 2) Baselines and Evaluation Protocol: We compare DDSH with eleven state-of-the-art baselines, including LSH [19], ITQ [20], LFH [33], FastH [23], SDH [25], COSDISH [34], NDH [17], DHN [43], DSH [40], DPSH [42] and DSDH [44]. These baselines are briefly introduced as follows: • Locality-sensitive hashing (LSH) [19]: LSH is a representative data-independent hashing method. LSH utilizes random projection to generate hash function. • Iterative quantization (ITQ) [20]: ITQ is a representative unsupervised hashing method. ITQ first projects data points into low space by utilizing principal component analysis (PCA). Then ITQ minimizes the quantization error to learn binary code.
6002 IEEE TRANSACTIONS ON IMAGE PROCESSING,VOL.27.NO.12,DECEMBER 2018 TABLE IV as input and learns binary codes by maximizing the EXAMPLE POINTS OF THE DATASETS discriminability of the corresponding binary codes. Deep pairwise-supervised hashing (DPSH)[42]:DPSH Dataset Example Label is a deep supervised hashing method.DPSH performs 尼腰 “frog” deep feature learning and hash-code learning simulta- CIFAR-10 neously with pairwise labels by minimizing negative 加题 "deer". log-likelihood of the observed pairwise labels. 雷妈 "tuck” Deep supervised discrete hashing (DSDH)[44]:DSDH is a deep supervised hashing method.Similar to NDH, 222 2” DSDH also utilizes both pointwise label and pairwise similarity to learn binary codes and deep neural net- SVHN 3331 “3” work.Furthermore,DSDH adopts the method of auxiliary “9” coordinates(MAC)technique in AFFHash [49]to bridge binary coding procedure and feature learning procedure. Among all these baselines,LSH is a data-independent hash- “person”,“sky" ing method.ITQ is an unsupervised hashing method.LFH, FastH.COSDISH,and SDH are non-deep methods,which can- not perform deep feature learning.LFH is a relaxation-based NUS-WIDE "clouds'".“ocean'”, person”,“sky”,"water'” method.FastH,COSDISH and SDH are discrete supervised hashing methods.NDH is a hand-crafted feature based deep “road",“clouds'" supervised hashing method.DHN,DSH,and DPSH are deep "sky”,buildings”. hashing methods with pairwise labels which can perform feature learning and hash-code learning simultaneously.DSDH is a deep supervised hashing method which utilizes both wT-shirt'”. pointwise label and pairwise similarity. For fair comparison,all feature learning based deep hashing ClothingIM methods,including deep baselines and our DDSH,adopt the same pre-trained CNN-F model on ImageNet6 for feature learning.Because the CNN-F model is pre-trained with images "shawl". of size 224 x 224 pixels,we first resize all images to be 224 x 224 pixels for four datasets.Then the raw image pixels are directly utilized as input for deep hashing methods. We carefully implement DHN and DSH on MatConvNet. We fix the mini-batch size to be 128 and tune the learning rate from 10-6 to 10-2 by using a cross-validation strategy. Latent factor hashing (LFH)[33]:LFH is a supervised Furthermore,we set weight decay as 5 x 10-4 to avoid hashing method which tries to learn binary code based overfitting.For DDSH,we set =100,Tour 3 and on latent factor models. Tin 50 for CIFAR-10,SVHN and NUS-WIDE datasets. Fast supervised hashing(FastH)[23]:FastH is supervised Because NUS-WIDE is a multi-label dataset,we reduce the hashing method.FastH directly adopts graph-cut method similarity weight for those training points with multi labels to learn discrete binary code. when we train DDSH.For ClothinglM dataset,we set Supervised discrete hashing(SDH)[25]:SDH is a point- 500,Tout =10 and Tin =15. wise supervised hashing method which utilizes the dis- For other supervised hashing methods,including LFH,ITQ, crete cyclic coordinate descent(DCC)algorithm to learn LFH,FastH,SDH,COSDISH and NDH,we use 4,096-dim discrete hash code. deep features extracted by the CNN-F model pre-trained Column sampling based discrete supervised hash- on ImageNet as input for fair comparison.Because SDH ing(COSDISH)[34]:COSDISH is a supervised hashing is a kernel-based methods,we randomly sample 1,000 data method.COSDISH can directly learn discrete hash code. points as anchors to construct the kernel by following the Nonlinear deep hashing (NDH)[17]:NDH is a deep suggestion of Shen et al.[25]of SDH.For LFH,FastH and supervised hashing method.NDH utilizes both pointwise COSDISH,we utilize boosted decision tree for out-of-sample label and pairwise similarity to guide binary coding and extension by following the setting of FastH.For NDH whose hash-function learning.However,NDH is a hand-crafted source code is not available,we carefully re-implement its feature based method. algorithm by ourselves.Following the authors'suggestion, Deep hashing network(DHN)[43]:DHN is a deep super- we use 200-dimension feature vector derived from PCA on vised hashing method.DHN minimizes both pairwise deep features for NDH. cross-entropy loss and pairwise quantization loss. Deep supervised hashing (DSH)[40]:DSH is a deep 6We download the CNN-F model pre-trained on ImageNet supervised hashing method.DSH takes pairs of points from http://www.vlfeat.org/matconvnet/pretrained/
6002 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2018 TABLE IV EXAMPLE POINTS OF THE DATASETS • Latent factor hashing (LFH) [33]: LFH is a supervised hashing method which tries to learn binary code based on latent factor models. • Fast supervised hashing (FastH) [23]: FastH is supervised hashing method. FastH directly adopts graph-cut method to learn discrete binary code. • Supervised discrete hashing (SDH) [25]: SDH is a pointwise supervised hashing method which utilizes the discrete cyclic coordinate descent (DCC) algorithm to learn discrete hash code. • Column sampling based discrete supervised hashing (COSDISH) [34]: COSDISH is a supervised hashing method. COSDISH can directly learn discrete hash code. • Nonlinear deep hashing (NDH) [17]: NDH is a deep supervised hashing method. NDH utilizes both pointwise label and pairwise similarity to guide binary coding and hash-function learning. However, NDH is a hand-crafted feature based method. • Deep hashing network (DHN) [43]: DHN is a deep supervised hashing method. DHN minimizes both pairwise cross-entropy loss and pairwise quantization loss. • Deep supervised hashing (DSH) [40]: DSH is a deep supervised hashing method. DSH takes pairs of points as input and learns binary codes by maximizing the discriminability of the corresponding binary codes. • Deep pairwise-supervised hashing (DPSH) [42]: DPSH is a deep supervised hashing method. DPSH performs deep feature learning and hash-code learning simultaneously with pairwise labels by minimizing negative log-likelihood of the observed pairwise labels. • Deep supervised discrete hashing (DSDH) [44]: DSDH is a deep supervised hashing method. Similar to NDH, DSDH also utilizes both pointwise label and pairwise similarity to learn binary codes and deep neural network. Furthermore, DSDH adopts the method of auxiliary coordinates (MAC) technique in AFFHash [49] to bridge binary coding procedure and feature learning procedure. Among all these baselines, LSH is a data-independent hashing method. ITQ is an unsupervised hashing method. LFH, FastH, COSDISH, and SDH are non-deep methods, which cannot perform deep feature learning. LFH is a relaxation-based method. FastH, COSDISH and SDH are discrete supervised hashing methods. NDH is a hand-crafted feature based deep supervised hashing method. DHN, DSH, and DPSH are deep hashing methods with pairwise labels which can perform feature learning and hash-code learning simultaneously. DSDH is a deep supervised hashing method which utilizes both pointwise label and pairwise similarity. For fair comparison, all feature learning based deep hashing methods, including deep baselines and our DDSH, adopt the same pre-trained CNN-F model on ImageNet6 for feature learning. Because the CNN-F model is pre-trained with images of size 224 × 224 pixels, we first resize all images to be 224 × 224 pixels for four datasets. Then the raw image pixels are directly utilized as input for deep hashing methods. We carefully implement DHN and DSH on MatConvNet. We fix the mini-batch size to be 128 and tune the learning rate from 10−6 to 10−2 by using a cross-validation strategy. Furthermore, we set weight decay as 5 × 10−4 to avoid overfitting. For DDSH, we set || = 100, Tout = 3 and Tin = 50 for CIFAR-10, SVHN and NUS-WIDE datasets. Because NUS-WIDE is a multi-label dataset, we reduce the similarity weight for those training points with multi labels when we train DDSH. For Clothing1M dataset, we set || = 500, Tout = 10 and Tin = 15. For other supervised hashing methods, including LFH, ITQ, LFH, FastH, SDH, COSDISH and NDH, we use 4,096-dim deep features extracted by the CNN-F model pre-trained on ImageNet as input for fair comparison. Because SDH is a kernel-based methods, we randomly sample 1,000 data points as anchors to construct the kernel by following the suggestion of Shen et al. [25] of SDH. For LFH, FastH and COSDISH, we utilize boosted decision tree for out-of-sample extension by following the setting of FastH. For NDH whose source code is not available, we carefully re-implement its algorithm by ourselves. Following the authors’ suggestion, we use 200-dimension feature vector derived from PCA on deep features for NDH. 6We download the CNN-F model pre-trained on ImageNet from http://www.vlfeat.org/matconvnet/pretrained/.
JIANG et al.:DEEP DISCRETE SUPERVISED HASHING 6003 TABLE V TABLE VI MAP OF THE HAMMING RANKING TASK ON CIFAR-10 DATASET. MAP OF THE HAMMING RANKING TASK ON SVHN DATASET THE BEST ACCURACY IS SHOWN IN BOLDFACE THE BEST ACCURACY IS SHOWN IN BOLDFACE CIFAR-10 SVHN Method 12 bits 24 bits 32 bits 48 bits Method 12 bits 24 bits 32 bits 48 bits DDSH 0.76950.82890.83520.8194 DDSH 0.5735 0.67440.70310.7184 DSDH 0.7442 0.7868 0.7991 0.8142 DSDH 0.5121 0.5670 0.5866 0.5839 DPSH 0.6844 0.7225 0.7396 0.7460 DPSH 0.3790 0.4216 0.4337 0.4557 DSH 0.6457 0.7492 0.7857 0.8113 DSH 0.3702 0.4802 0.5232 0.5828 DHN 0.6725 0.7107 0.7045 0.7135 DHN 0.3800 0.4096 0.4158 0.4302 NDH 0.5620 0.6404 0.655 0.6772 NDH 0.2177 0.2710 0.2563 0.2803 COSDISH 0.6085 0.6827 0.6959 0.7158 COSDISH 0.2381 0.2951 0396 0.3408 SDH 0.5200 0.6461 0.6577 0.6688 SDH 0.1509 0.2996 0.3202 0.3335 FastH 0.6202 0.6731 0.6870 0.7163 FastH 0.2516 0.2961 0.3177 03436 LFH 0.4009 0.647 0.6571 0.6995 LFH 0.1933 0.2558 0.2839 0.3253 ITQ 0.2580 0.2725 0.28340.2936 ITO 0.1108 0.1138 0.1149 0.1159 LSH 0.1468 0.1725 0.1798 0.1929 LSH 0.1074 0.1082 0.1093 0.1109 TABLE VII In our experiment,ground-truth neighbors are defined based MAP OF THE HAMMING RANKING TASK ON NUS-WIDE DATASET on whether two data points share at least one class label. THE BEST ACCURACY IS SHOWN IN BOLDFACE We carry out Hamming ranking task and hash lookup task to evaluate DDSH and baselines.We report the Mean Average NUS-WIDE Method 12 bits 24 bits 32 bits 48 bits Precision(MAP),top-k precision,precision-recall curve and DDSH 0.7911 0.81650.8217 0.8259 case study for Hamming ranking task.Specifically,given a DSDH 0.7916 0.8059 0.8063 0R180 query xg,we can calculate its average precision(AP)through DPSH 0.7882 0.8085 0.8167 0.8234 the following equation: DSH 0.7622 0.7940 0.7968 0.8081 DHN 0.7900 0.8101 0.8092 0.8180 H 0.7015 0.7351 0.7447 0744g 1 AP(xg)= >P(k)I(k), COSDISH 0.733 0.7643 0.7868 0.7993 Rk =1 SDH 0.7385 0.7616 0.7697 0.7720 FastH 0.7412 0.7830 0.7948 0.8085 where R&is the number of the relevant samples,P(k)is the LFH 0.7049 0.7594 0.7778 0.7936 ITQ 0.5053 0.5037 0503 0.5054 precision at cut-off k in the returned sample list and I1(k)is LSH 0.3407 0.3506 0.3509 0.3706 an indicator function which equals 1 if the kth returned sample is a ground-truth neighbor of xg.Otherwise,I(k)is 0.Given TABLE VIII O queries,we can compute the MAP as follows: MAP OF THE HAMMING RANKING TASK ON CLOTHINGIM DATASET THE BEST ACCURACY IS SHOWN IN BOLDFACE 1 MAP= >AP(xq). Q q=1 Method ClothingIM 12 bits 24 bits 32 bits 48 bits Because NUS-WIDE is relatively large,the MAP value on DDSH 0.2763 0.3667 0.3878 0.4008 NUS-WIDE is calculated based on the top 5000 returned DSDH 0.2903 0.3285 0.3413 0.3475 DPSH 0.1950 0.2087 0.2162 0.2181 neighbors.The MAP values for other datasets are calculated DSH 0.1730 0.1870 0.1912 0.2021 based on the whole retrieval set. DHN 0.1909 0.2243 0.2120 0.2488 For hash lookup task,we report mean hash lookup success NDH 0.1857 0.2276 0.2338 0.2354 COSDISH 0.1871 0.2358 0.2567 0.2756 rate (SR)within Hamming radius 0,1 and 2 [281.When at SDH 0.1518 0.1865 0.1941 0.1973 least one ground-truth neighbor is retrieved within a specific FastH 0.1736 0.2066 0.2167 0.2440 Hamming radius,we call it a lookup success.The hash lookup LFH 0.1548 1.1591 0.2128 0.2579 ITQ 0.50 0.1214 0.1228 0.1259 success rate (SR)can be calculated as follows: LSH 0.08340.08940.0914 0.0920 SR=∑ I(#retrieved ground-truth for query x>0) q=1 Q accuracy in most cases compared with all baselines,including Here,I()is an indicator function,i.e.,I(true)=I and deep hashing methods,non-deep supervised hashing methods, I(false)=0.O is the total number of query images. non-deep unsupervised hashing methods and data-independent methods. By comparing ITQ to LSH,we can find that the B.Experimental Result data-dependent hashing methods can significantly outper- 1)Hamming Ranking Task:Table V,Table VI,Table VII form data-independent hashing methods.By comparing NDH, and Table VIII reports the MAP result on CIFAR-10,SVHN,COSDISH,SDH,FastH and LFH to ITQ,we can find NUS-WIDE and ClothingIM dataset,respectively.We can that supervised methods can outperform unsupervised meth- easily find that our DDSH achieves the state-of-the-art retrieval ods because of the effect of using supervised information
JIANG et al.: DEEP DISCRETE SUPERVISED HASHING 6003 TABLE V MAP OF THE HAMMING RANKING TASK ON CIFAR-10 DATASET. THE BEST ACCURACY IS SHOWN IN BOLDFACE In our experiment, ground-truth neighbors are defined based on whether two data points share at least one class label. We carry out Hamming ranking task and hash lookup task to evaluate DDSH and baselines. We report the Mean Average Precision (MAP), top-k precision, precision-recall curve and case study for Hamming ranking task. Specifically, given a query xq , we can calculate its average precision (AP) through the following equation: AP(xq ) = 1 Rk N k=1 P(k)I1(k), where Rk is the number of the relevant samples, P(k) is the precision at cut-off k in the returned sample list and I1(k) is an indicator function which equals 1 if the kth returned sample is a ground-truth neighbor of xq. Otherwise, I1(k) is 0. Given Q queries, we can compute the MAP as follows: MAP = 1 Q Q q=1 AP(xq ). Because NUS-WIDE is relatively large, the MAP value on NUS-WIDE is calculated based on the top 5000 returned neighbors. The MAP values for other datasets are calculated based on the whole retrieval set. For hash lookup task, we report mean hash lookup success rate (SR) within Hamming radius 0, 1 and 2 [28]. When at least one ground-truth neighbor is retrieved within a specific Hamming radius, we call it a lookup success. The hash lookup success rate (SR) can be calculated as follows: S R = Q q=1 I(#retrieved ground-truth for query xq > 0) Q Here, I(·) is an indicator function, i.e., I(true) = 1 and I(false) = 0. Q is the total number of query images. B. Experimental Result 1) Hamming Ranking Task: Table V, Table VI, Table VII and Table VIII reports the MAP result on CIFAR-10, SVHN, NUS-WIDE and Clothing1M dataset, respectively. We can easily find that our DDSH achieves the state-of-the-art retrieval TABLE VI MAP OF THE HAMMING RANKING TASK ON SVHN DATASET. THE BEST ACCURACY IS SHOWN IN BOLDFACE TABLE VII MAP OF THE HAMMING RANKING TASK ON NUS-WIDE DATASET. THE BEST ACCURACY IS SHOWN IN BOLDFACE TABLE VIII MAP OF THE HAMMING RANKING TASK ON CLOTHING1M DATASET. THE BEST ACCURACY IS SHOWN IN BOLDFACE accuracy in most cases compared with all baselines, including deep hashing methods, non-deep supervised hashing methods, non-deep unsupervised hashing methods and data-independent methods. By comparing ITQ to LSH, we can find that the data-dependent hashing methods can significantly outperform data-independent hashing methods. By comparing NDH, COSDISH, SDH, FastH and LFH to ITQ, we can find that supervised methods can outperform unsupervised methods because of the effect of using supervised information.
6004 IEEE TRANSACTIONS ON IMAGE PROCESSING,VOL.27.NO.12,DECEMBER 2018 0 -DH 02 00 Recall (a) 6 (c) (d) 09 07 045 4 1 09 DHN +D 0 。=D50E 03 0 DPSI 一IN DHN 0 Recall 6 02 Recall (e) (f) (g) (h) 08 04 30 0 DHN DG用 =DD 。=sDH 03 DPSH =DP5 H 0 -DHN 02 08 02 02 0. (i) ) (k) () 08 03 -00SII +一山5川 DPSH 0.3 年0H --DHN 0 ◆DDB =DGD用 0. 05 0 02 02 04 06 0 (m) (n) (o) (p) Fig.2.Performance of precision-recall curve on four datasets.The four sub-figures in each row are the precision-recall curves for 12 bits,24 bits,32 bits and 48 bits,respectively.(a)12 bits @CIFAR-10.(b)12 bits @SVHN.(c)12 bits @NUS-WIDE.(d)12 bits @ClothingIM.(e)24 bits @CIFAR-10. (f)24 bits @SVHN.(g)24 bits @NUS-WIDE.(h)24 bits @ClothingIM.(i)32 bits @CIFAR-10.(j)32 bits @SVHN.(k)32 bits @NUS-WIDE.(1)48 bits @Clothing1M.(m)48 bits @CIFAR-10.(n)48 bits @SVHN.(o)48 bits @NUS-WIDE.(p)48 bits @ClothingIM. By comparing NDH,COSDISH,SDH and FastH to LFH. to directly guide deep feature learning procedure but other we can find that discrete supervised hashing can outper- discrete supervised hashing methods do not have deep feature form relaxation-based continuous hashing,which means that learning ability.The main difference between our DDSH and discrete coding procedure is able to learn more optimal other deep hashing methods is that DDSH adopts supervised binary codes.By comparing feature learning based deep information to directly guide the discrete coding procedure but hashing methods,i.e.,DDSH,DSDH,DPSH,DHN and DSH, other deep hashing methods do not have this property.Hence. to non-deep hashing methods,we can find that feature learning the experimental results successfully demonstrate the motiva- based deep hashing can outperform non-deep hashing because tion of DDSH,i.e.,utilizing supervised information to directly deep supervised hashing can perform deep feature learning guide both deep feature learning procedure and discrete coding compared with non-deep hashing methods.This experimental procedure can further improve retrieval performance in real result demonstrates that deep supervised hashing is a more applications. compatible architecture for hashing learning. Furthermore,we select four best baselines,i.e.,DSDH, The main difference between our proposed DDSH and other DPSH,DSH and DHN,to compare the precision-recall and discrete supervised hashing methods like COSDISH,SDH top-k precision results.We report the precision-recall curve and FastH is that our DDSH adopts supervised information on all four datasets in Figure 2.We can see that the
6004 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 27, NO. 12, DECEMBER 2018 Fig. 2. Performance of precision-recall curve on four datasets. The four sub-figures in each row are the precision-recall curves for 12 bits, 24 bits, 32 bits and 48 bits, respectively. (a) 12 bits @CIFAR-10. (b) 12 bits @SVHN. (c) 12 bits @NUS-WIDE. (d) 12 bits @Clothing1M. (e) 24 bits @CIFAR-10. (f) 24 bits @SVHN. (g) 24 bits @NUS-WIDE. (h) 24 bits @Clothing1M. (i) 32 bits @CIFAR-10. (j) 32 bits @SVHN. (k) 32 bits @NUS-WIDE. (l) 48 bits @Clothing1M. (m) 48 bits @CIFAR-10. (n) 48 bits @SVHN. (o) 48 bits @NUS-WIDE. (p) 48 bits @Clothing1M. By comparing NDH, COSDISH, SDH and FastH to LFH, we can find that discrete supervised hashing can outperform relaxation-based continuous hashing, which means that discrete coding procedure is able to learn more optimal binary codes. By comparing feature learning based deep hashing methods, i.e., DDSH, DSDH, DPSH, DHN and DSH, to non-deep hashing methods, we can find that feature learning based deep hashing can outperform non-deep hashing because deep supervised hashing can perform deep feature learning compared with non-deep hashing methods. This experimental result demonstrates that deep supervised hashing is a more compatible architecture for hashing learning. The main difference between our proposed DDSH and other discrete supervised hashing methods like COSDISH, SDH and FastH is that our DDSH adopts supervised information to directly guide deep feature learning procedure but other discrete supervised hashing methods do not have deep feature learning ability. The main difference between our DDSH and other deep hashing methods is that DDSH adopts supervised information to directly guide the discrete coding procedure but other deep hashing methods do not have this property. Hence, the experimental results successfully demonstrate the motivation of DDSH, i.e., utilizing supervised information to directly guide both deep feature learning procedure and discrete coding procedure can further improve retrieval performance in real applications. Furthermore, we select four best baselines, i.e., DSDH, DPSH, DSH and DHN, to compare the precision-recall and top-k precision results. We report the precision-recall curve on all four datasets in Figure 2. We can see that the
JIANG et al.:DEEP DISCRETE SUPERVISED HASHING 6005 065 055 04 4 035 0 -Ds 025 (a) (b) (c) 可 -DH +-DDXSH -◆D05H DDU 12 D5E DPSI e-3 e-IH ◆-目 DHN Returned sampies (e) ② (g) (h) 04G -D:H --DHS DDSH -。一D5H DDGB 。=Dsn日 。sD目 02s =DP5H 07 DP 0 -用 -DHN O.t a0 Returned samples (i) G) (k) ) 05 -00SII +一山5川 0 年0H ◆DDG用 --DOSH ◆DDsB 0 。D5E -DSDH =DGD用 ngE 0 -s阳 02 c1 200 Returned samples (m) (n) (o) (p) Fig.3.Performance of top-k precision on four datasets.The four sub-figures in each row are the top-k precision curves for 12 bits,24 bits,32 bits and 48 bits. respectively.(a)12 bits @CIFAR-10.(b)12 bits @SVHN.(c)12 bits @NUS-WIDE.(d)12 bits @Clothing1M.(e)24 bits @CIFAR-10.(f)24 bits @SVHN. (g)24 bits @NUS-WIDE.(h)24 bits @Clothing1M.(i)32 bits @CIFAR-10.(j)32 bits @SVHN.(k)32 bits @NUS-WIDE.(1)32 bits @Clothing1M (m)48 bits @CIFAR-10.(n)48 bits @SVHN.(o)48 bits @NUS-WIDE.(p)48 bits @ClothingIM. proposed DDSH still achieves the best performance in terms In Figure 4,we present the mean hash lookup success rate of precision-recall curve in most cases. within Hamming radius 0,1 and 2 on all four datasets for all In real applications,we might care about top-k retrieval deep hashing methods.We can find that DDSH can achieve the results more than the whole database.Hence we report the best mean hash lookup success rate on four datasets,especially top-k precision on four datasets based on the returned top-k for long codes samples.In Figure 3,we show the top-k precision for dif- 3)Further Analysis:To further demonstrate the effective- ferent k on CIFAR-10 dataset,SVHN dataset,NUS-WIDE ness of utilizing supervised information to directly guide and ClothingIM dataset respectively,where k is the number both discrete coding procedure and deep feature learning of returned samples.Again,we can find that DDSH can procedure in the same end-to-end framework,we evaluate outperform other deep hashing methods in most cases. several variants of DDSH.These variants include "DDSHO", 2)Hash Lookup Task:In practice,retrieval with hash “COSDISH-Linear'”,“COSDISH-CNN'and "DDSH-MAC" lookup can usually achieve constant or sub-linear search speed DDSHO denotes the variant in which we fix the parame- in real applications.Recent works like DGH [28]show that ters of the first seven layers of CNN-F in DDSH during discrete hashing can significantly improve hash lookup success training procedure.In other words,DDSHO can not perform rate. deep feature learning procedure,and all the other parts are
JIANG et al.: DEEP DISCRETE SUPERVISED HASHING 6005 Fig. 3. Performance of top-k precision on four datasets. The four sub-figures in each row are the top-k precision curves for 12 bits, 24 bits, 32 bits and 48 bits, respectively. (a) 12 bits @CIFAR-10. (b) 12 bits @SVHN. (c) 12 bits @NUS-WIDE. (d) 12 bits @Clothing1M. (e) 24 bits @CIFAR-10. (f) 24 bits @SVHN. (g) 24 bits @NUS-WIDE. (h) 24 bits @Clothing1M. (i) 32 bits @CIFAR-10. (j) 32 bits @SVHN. (k) 32 bits @NUS-WIDE. (l) 32 bits @Clothing1M. (m) 48 bits @CIFAR-10. (n) 48 bits @SVHN. (o) 48 bits @NUS-WIDE. (p) 48 bits @Clothing1M. proposed DDSH still achieves the best performance in terms of precision-recall curve in most cases. In real applications, we might care about top-k retrieval results more than the whole database. Hence we report the top-k precision on four datasets based on the returned top-k samples. In Figure 3, we show the top-k precision for different k on CIFAR-10 dataset, SVHN dataset, NUS-WIDE and Clothing1M dataset respectively, where k is the number of returned samples. Again, we can find that DDSH can outperform other deep hashing methods in most cases. 2) Hash Lookup Task: In practice, retrieval with hash lookup can usually achieve constant or sub-linear search speed in real applications. Recent works like DGH [28] show that discrete hashing can significantly improve hash lookup success rate. In Figure 4, we present the mean hash lookup success rate within Hamming radius 0, 1 and 2 on all four datasets for all deep hashing methods. We can find that DDSH can achieve the best mean hash lookup success rate on four datasets, especially for long codes. 3) Further Analysis: To further demonstrate the effectiveness of utilizing supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same end-to-end framework, we evaluate several variants of DDSH. These variants include “DDSH0”, “COSDISH-Linear”, “COSDISH-CNN” and “DDSH-MAC”. DDSH0 denotes the variant in which we fix the parameters of the first seven layers of CNN-F in DDSH during training procedure. In other words, DDSH0 can not perform deep feature learning procedure, and all the other parts are