ExchNet:A Unified Hashing Network for Large-Scale Fine-Grained Retrieval 197 to learn two different binary codes for the same training sample.The learning procedure is as follows: ug(oba]cat)=sig(W(foballeat), (9) v:=h(G fgloballcat)=sign(W(h)[;fgloballeat), (10) where [;]cat denotes the concatenation operator,and ui,viE{-1,+119 denote the two different binary codes of the i-th sample.g represents the code length W(g)and W(h)present the parameters of hash functions g()and h(),respec- tively.We denote U ={ui}and V={vi}21 as learned binary codes. Inspired by [14,we only keep binary codes v;and set hash function h()implic- itly.Hence,we can perform feature learning and binary codes learning simulta- neously. To preserve the pairwise similarity,we adopt the squared loss and define the following objective function: L(ui,vj:C)=(uJvj-qSi)2, (11) where u(bae),is the pairwise similarity label andc-C We use to denote the parameters of deep neural network and hash layer.The aforementioned process is generally illustrated by Fig.4. Due to the zero-gradient problem caused by the sign()function,La(,,) becomes intractable to optimize.In this paper,we relax g()=sign()into g()=tanh()to alleviate this problem.Then,we can derive the following loss function: Cq(,,C)=(au-q5)2, (12) wherefoballeat)and U is relaxed as U= Then,given a set of image samples ={1,...,n}and their pairwise labels S={S=,we can get the following objective function by combining Eqs.(5),(7)and(12: .c()=∑ca,u:S)+∑Cpc,)+n∑c(c) (13) V.e.c i,j=1 i=1 i=1 8.t.i∈{1,…,n,i=(g:flob]cat),v∈{-l,+1}9 where Sj represents the similarity between the i-th and j-th samples,q denotes the code length,A and y are hyper-parameters. 3.4 Learning Algorithm To solve the optimization problem in Eq.(13),we design an alternating algorithm to learn V,6,and C.Specifically,we learn one parameter with the others fixed. 1 We omit the bias term for simplicity.ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Retrieval 197 to learn two different binary codes for the same training sample. The learning procedure is as follows: ui = g([Gˆi; f global i ]cat) = sign(W(g) [Gˆi; f global i ]cat), (9) vi = h([Gˆi; f global i ]cat) = sign(W(h) [Gˆi; f global i ]cat), (10) where [; ]cat denotes the concatenation operator, and ui, vi ∈ {−1, +1}q denote the two different binary codes of the i-th sample. q represents the code length. W(g) and W(h) present the parameters of hash functions g(·) and h(·)1, respectively. We denote U = {ui}n i=1 and V = {vi}n i=1 as learned binary codes. Inspired by [14], we only keep binary codes vi and set hash function h(·) implicitly. Hence, we can perform feature learning and binary codes learning simultaneously. To preserve the pairwise similarity, we adopt the squared loss and define the following objective function: Lsq(ui, vj , C) = u i vj − qSij 2 , (11) where ui = g([Gˆi; f global i ]cat), Sij is the pairwise similarity label and C = {Ci}M i=1. We use Θ to denote the parameters of deep neural network and hash layer. The aforementioned process is generally illustrated by Fig. 4. Due to the zero-gradient problem caused by the sign(·) function, Lsq(·, ·, ·) becomes intractable to optimize. In this paper, we relax g(·) = sign(·) into gˆ(·) = tanh(·) to alleviate this problem. Then, we can derive the following loss function: Lˆsq(uˆi, vj , C) = uˆ i vj − qSij 2 , (12) where uˆi = ˆg([Gˆi; f global i ]cat) and U is relaxed as Uˆ = {uˆi}n i=1. Then, given a set of image samples X = {x1,...,xn} and their pairwise labels S = {Sij}n i,j=1, we can get the following objective function by combining Eqs. (5), (7) and (12): min V ,Θ,C L(X ) = n i,j=1 Lˆsq(uˆi, vj ; Sij ) + λ n i=1 Lsp(xi) + γ n i=1 Lcp(xi) (13) s.t.∀i ∈ {1,...,n},uˆi = ˆg([Gˆi; f global i ]cat), vj ∈ {−1, +1}q, where Sij represents the similarity between the i-th and j-th samples, q denotes the code length, λ and γ are hyper-parameters. 3.4 Learning Algorithm To solve the optimization problem in Eq. (13), we design an alternating algorithm to learn V , Θ, and C. Specifically, we learn one parameter with the others fixed. 1 We omit the bias term for simplicity.