正在加载图片...
JIANG et al.:DEEP DISCRETE SUPERVISED HASHING 5999 original problem in (1)by only using the sampled columns of TABLE II S: CONFIGURATION OF THE CONVOLUTIONAL LAYERS IN DDSH Configuration min C(h)= ∑∑Lhx,h(xs) Layer filter size stride pad LRN pool ien j=l 64×11×11 4×4 0 yes 2×2 conv2 256×5×5 yes 2×2 L(h(xi),h(xj);Sij) conv3 256×3×3 1×1 no XiEXOXjEXT conv4 256×3×3 1×1 no conv5 256×3×3 1×1 0 2×2 + ∑Lhs,hss 2 X,x∈X0 TABLE III CONFIGURATION OF THE FULLY-CONNECTED LAYERS IN DDSH Then we introduce auxiliary variables to solve problem(2) More specifically,we utilize auxiliary variables B=(bili E Layer Configuration with bie(-1,+1e to replace part of the binary codes full6 4096 generated by the hash function,i.e.,h(X).Here,h(X)= full7 4096 full8 (h(xi)x;X).Then we rewrite the problem(2)as follows: hash code length c min C(h,B9=∑∑Lb,hs片S) h BO 2)Feature Learning Part:The binary codes of X are iEO xiEX generated by the hash function h(),which is defined based on +∑Lb,bj:S) the output of the deep feature learning part.More specifically, i.jen we define our hash function as:h(x)=sign(F(x;))where s.t.b:e{-1,+1},i∈2 (3) sign()is the element-wise sign function.F(x:denotes the output of the feature learning part and denotes all The problem in (3)is the final loss function (objective)to parameters of the deep neural network. learn by DDSH.We can find that the whole training set is We adopt a convolutional neural network(CNN)from [47]. divided into two subsets X and X.The binary codes of i.e.,CNN-F,as our deep feature learning part.We replace X i.e..BR,are directly learned from the objective function the last layer of CNN-F as one fully-connected layer to in(3).but the binary codes of xr are generated by the hash project the output of the second last layer to Re space. function h().h()is defined based on the output of the deep More specifically,the feature learning part contains 5 con- feature learning part,which will be introduced in the following volutional layers ("conv1-conv5)and 3 fully-connected lay- subsection. ers ("full6-full8").The detailed configuration of the 5 con- The learning of B contains the discrete coding procedure. volutional layers is shown in Table II.In Table II,"filter which is directly guided by the supervised information.The size"denotes the number of convolutional filters and their learning of h()contains the deep feature learning procedure, receptive field size."stride"specifies the convolutional stride. which is also directly guided by the supervised informa- "pad"indicates the number of pixels to add to each size tion.Hence,our DDSH can utilize supervised information to of the input."LRN"denotes whether Local Response Nor- directly guide both discrete coding procedure and deep feature malization (LRN)[45]is applied or not."pool"denotes learning procedure in the same end-to-end deep framework. the down-sampling factor.The detailed configuration of the This is different from existing deep hashing methods which 3 fully-connected layers is shown in Table III,where the either use relaxation strategy without discrete coding or do not "configuration"shows the number of nodes in each layer. use the supervised information to directly guide the discrete We adopt the Rectified Linear Unit (ReLU)[45]as activa- coding procedure. tion function for all the first seven layers.For the last layer, Please note that "directly guided"in this paper means that we utilize identity function as the activation function. the supervised information is directly included in the corre- sponding terms in the loss function.For example,the super- vised information Sij is directly included in all terms about the B.Learning discrete codes B in(3),which means that the discrete coding After randomly sampling at each iteration,we utilize an procedure is directly guided by the supervised information. alternating learning strategy to solve problem(3). Furthermore,the supervised information Sij is also directly More specifically,each time we learn one of the variables included in the term about the deep feature learning function B and h(F(x;)with the other fixed.When h(F(x;))is h(xj)in(3),which means that the deep feature learning pro- fixed,we directly learn the discrete hash code B over binary cedure is also directly guided by the supervised information. variables.When B is fixed,we update the parameter of To the best of our knowledge,DDSH is the first deep hashing the deep neural network. method which can utilize pairwise supervised information to 1)Learn B With h(F(x:)Fixed:When h(F(x:)is directly guide both discrete coding procedure and deep feature fixed,it's easy to transform problem(3)into a binary quadratic learning procedure,and thus enhance the feedback between programming (BQP)problem as that in TSH [22].Each time these two important procedures. we optimize one bit for all points.Then,the optimization ofJIANG et al.: DEEP DISCRETE SUPERVISED HASHING 5999 original problem in (1) by only using the sampled columns of S: min h L(h) = i∈ n j=1 L(h(xi), h(x j); Sij) = xi∈X x j∈X L(h(xi), h(x j); Sij) + xi,x j∈X L(h(xi), h(x j); Sij). (2) Then we introduce auxiliary variables to solve problem (2). More specifically, we utilize auxiliary variables B = {bi|i ∈ } with bi ∈ {−1, +1}c to replace part of the binary codes generated by the hash function, i.e., h(X). Here, h(X) = {h(xi)|xi ∈ X}. Then we rewrite the problem (2) as follows: min h,B L(h,B) = i∈ x j∈X L(bi, h(x j); Sij) + i,j∈ L(bi, bj; Sij) s.t. bi ∈ {−1, +1} c , ∀i ∈ (3) The problem in (3) is the final loss function (objective) to learn by DDSH. We can find that the whole training set is divided into two subsets X and X. The binary codes of X, i.e., B, are directly learned from the objective function in (3), but the binary codes of X are generated by the hash function h(·). h(·) is defined based on the output of the deep feature learning part, which will be introduced in the following subsection. The learning of B contains the discrete coding procedure, which is directly guided by the supervised information. The learning of h(·) contains the deep feature learning procedure, which is also directly guided by the supervised informa￾tion. Hence, our DDSH can utilize supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same end-to-end deep framework. This is different from existing deep hashing methods which either use relaxation strategy without discrete coding or do not use the supervised information to directly guide the discrete coding procedure. Please note that “directly guided” in this paper means that the supervised information is directly included in the corre￾sponding terms in the loss function. For example, the super￾vised information Sij is directly included in all terms about the discrete codes B in (3), which means that the discrete coding procedure is directly guided by the supervised information. Furthermore, the supervised information Sij is also directly included in the term about the deep feature learning function h(x j) in (3), which means that the deep feature learning pro￾cedure is also directly guided by the supervised information. To the best of our knowledge, DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both discrete coding procedure and deep feature learning procedure, and thus enhance the feedback between these two important procedures. TABLE II CONFIGURATION OF THE CONVOLUTIONAL LAYERS IN DDSH TABLE III CONFIGURATION OF THE FULLY-CONNECTED LAYERS IN DDSH 2) Feature Learning Part: The binary codes of X are generated by the hash function h(·), which is defined based on the output of the deep feature learning part. More specifically, we define our hash function as: h(x) = sign(F(x; )), where sign(·) is the element-wise sign function. F(x; ) denotes the output of the feature learning part and denotes all parameters of the deep neural network. We adopt a convolutional neural network (CNN) from [47], i.e., CNN-F, as our deep feature learning part. We replace the last layer of CNN-F as one fully-connected layer to project the output of the second last layer to Rc space. More specifically, the feature learning part contains 5 con￾volutional layers (“conv1-conv5”) and 3 fully-connected lay￾ers (“full6-full8”). The detailed configuration of the 5 con￾volutional layers is shown in Table II. In Table II, “filter size” denotes the number of convolutional filters and their receptive field size. “stride” specifies the convolutional stride. “pad” indicates the number of pixels to add to each size of the input. “LRN” denotes whether Local Response Nor￾malization (LRN) [45] is applied or not. “pool” denotes the down-sampling factor. The detailed configuration of the 3 fully-connected layers is shown in Table III, where the “configuration” shows the number of nodes in each layer. We adopt the Rectified Linear Unit (ReLU) [45] as activa￾tion function for all the first seven layers. For the last layer, we utilize identity function as the activation function. B. Learning After randomly sampling at each iteration, we utilize an alternating learning strategy to solve problem (3). More specifically, each time we learn one of the variables B and h(F(x; )) with the other fixed. When h(F(x; )) is fixed, we directly learn the discrete hash code B over binary variables. When B is fixed, we update the parameter of the deep neural network. 1) Learn B With h(F(x; )) Fixed: When h(F(x; )) is fixed, it’s easy to transform problem (3) into a binary quadratic programming (BQP) problem as that in TSH [22]. Each time we optimize one bit for all points. Then, the optimization of
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有