∂𝒥 ∂F∗𝑖 =1 2 ∑𝑛 𝑗=1 (𝜎(Θ𝑖𝑗 )G∗

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Deep Cross-Modal Hashing

正在加载图片...

=2∑a()G-SG) Algorithm 1 The learning algorithm for DCMH. Input:Image set X,text set Y.and cross-modal similarity 1 matrix S. +2y(Fi-B.i)+2nF1. (3) Output:Parameters 0 and 0 of the deep neural networks,and binary code matrix B. Then we can compute withby using the chain Initialization rule,based on which BP can be used to update the parameter Initialize neural network parameters and,mini-batch size 0x N Ny =128,and iteration number tr [n/N:1,ty [n/Nul. 3.2.2 Learn 0,with 0r and B Fixed repeat for iter=1,2,·,txdo When 0.and B are fixed.we also learn the neural network Randomly sample N points from X to construct a mini- parameter 0 of the text modality by using SGD with a BP batch. algorithm.More specifically,for each sampled point yi,we For each sampled point x;in the mini-batch,calculate first compute the following gradient: F.=f(x:;0)by forward propagation. Calculate the derivative according to (3). S=a(Θ)E-SF0 Update the parameter by using back propagation. 0G.j end for =1 for iter=1,2,·,tydo +2y(Gj-B）+2mG1 (4) Randomly sample Ny points from Y to construct a mini- batch. Then we can compute with by using the chain For each sampled point y;in the mini-batch,calculate rule,based on which BP can be used to update the parameter G.i=g(y;;0)by forward propagation. Calculate the derivative according to(4). Update the parameter 0,by using back propagation. 3.2.3 Learn B,with and 0 Fixed end for Learn B according to (5). When 0r and 0 are fixed,the problem in (2)can be until a fixed number of iterations reformulated as follows: mxtr(BT(h(F+G)》=tr(BV)=∑B, 4.Experiment i.j s.t.B∈{-1，+1exm, We carry out experiments on image-text datasets to veri- fy the effectiveness of DCMH.DCMH is implemented with where V=y(F+G). the open source deep learning toolbox MatConvNet [31]on It is easy to find that the binary code Bi;should keep the a NVIDIA K80 GPU server. same sign as Vij.Therefore,we have: 4.1.Datasets B sign(V)=sign((F+G)). (5) Three datasets,MIRFLICKR-25K [12],IAPR TC-12 [8] 3.3.Out-of-Sample Extension and NUS-WIDE [6],are used for evaluation. The original MIRFLICKR-25K dataset [12]consists of For any point which is not in the training set,we can 25,000 images collected from Flickr website.Each image obtain its hash code as long as one of its modalities (image is associated with several textual tags.Hence,each point or text)is observed.In particular,given the image modality is a image-text pair.We select those points which have at xg of point q,we can adopt forward propagation to generate least 20 textual tags for our experiment.The text for each the hash code as follows: point is represented as a 1386-dimensional bag-of-words vector.For the hand-crafted feature based method,each b()=h()(xa)=sign(f(xa;0)). image is represented by a 512-dimensional GIST feature Similarly,if point q only has the text modality ya we vector.Furthermore,each point is manually annotated with can also generate the hash code b)as follows: one of the 24 unique labels. The IAPR TC-12 dataset [8]consists of 20,000 image- b(u)=h()(ya)=sign(g(yq:0u)). text pairs which are annotated using 255 labels.We use the entire dataset for our experiment.The text for each point Hence,our DCMH model can be used for cross-modal is represented as a 2912-dimensional bag-of-words vector. search where the query points have one modality and the For the hand-crafted feature based method,each image is points in database have the other modality. represented by a 512-dimensional GIST feature vector.∂𝒥 ∂F∗𝑖 =1 2 ∑𝑛 𝑗=1 (𝜎(Θ𝑖𝑗 )G∗𝑗 − 𝑆𝑖𝑗G∗𝑗 ) + 2𝛾(F∗𝑖 − B∗𝑖)+2𝜂F1. (3) Then we can compute ∂𝒥 ∂𝜃𝑥 with ∂𝒥 ∂F∗𝑖 by using the chain rule, based on which BP can be used to update the parameter 𝜃𝑥. 3.2.2 Learn 𝜃𝑦, with 𝜃𝑥 and B Fixed When 𝜃𝑥 and B are fixed, we also learn the neural network parameter 𝜃𝑦 of the text modality by using SGD with a BP algorithm. More specifically, for each sampled point y𝑗 , we first compute the following gradient: ∂𝒥 ∂G∗𝑗 =1 2 ∑𝑛 𝑖=1 (𝜎(Θ𝑖𝑗 )F∗𝑖 − 𝑆𝑖𝑗F∗𝑖) + 2𝛾(G∗𝑗 − B∗𝑗 )+2𝜂G1. (4) Then we can compute ∂𝒥 ∂𝜃𝑦 with ∂𝒥 ∂G∗𝑗 by using the chain rule, based on which BP can be used to update the parameter 𝜃𝑦. 3.2.3 Learn B, with 𝜃𝑥 and 𝜃𝑦 Fixed When 𝜃𝑥 and 𝜃𝑦 are fixed, the problem in (2) can be reformulated as follows: max B tr(B𝑇 (𝛾(F + G))) = tr(B𝑇 V) = ∑ 𝑖,𝑗 𝐵𝑖𝑗𝑉𝑖𝑗 𝑠.𝑡. B ∈ {−1, +1}𝑐×𝑛, where V = 𝛾(F + G). It is easy to find that the binary code 𝐵𝑖𝑗 should keep the same sign as 𝑉𝑖𝑗 . Therefore, we have: B = sign(V) = sign(𝛾(F + G)). (5) 3.3. Out-of-Sample Extension For any point which is not in the training set, we can obtain its hash code as long as one of its modalities (image or text) is observed. In particular, given the image modality x𝑞 of point 𝑞, we can adopt forward propagation to generate the hash code as follows: b(𝑥) 𝑞 = ℎ(𝑥) (x𝑞) = sign(𝑓(x𝑞; 𝜃𝑥)). Similarly, if point 𝑞 only has the text modality y𝑞, we can also generate the hash code b(𝑦) 𝑞 as follows: b(𝑦) 𝑞 = ℎ(𝑦) (y𝑞) = sign(𝑔(y𝑞; 𝜃𝑦)). Hence, our DCMH model can be used for cross-modal search where the query points have one modality and the points in database have the other modality. Algorithm 1 The learning algorithm for DCMH. Input: Image set X, text set Y, and cross-modal similarity matrix S. Output: Parameters 𝜃𝑥 and 𝜃𝑦 of the deep neural networks, and binary code matrix B. Initialization Initialize neural network parameters 𝜃𝑥 and 𝜃𝑦, mini-batch size 𝑁𝑥 = 𝑁𝑦 = 128, and iteration number 𝑡𝑥 = ⌈𝑛/𝑁𝑥⌉, 𝑡𝑦 = ⌈𝑛/𝑁𝑦⌉. repeat for 𝑖𝑡𝑒𝑟 = 1, 2, ⋅⋅⋅ , 𝑡𝑥 do Randomly sample 𝑁𝑥 points from X to construct a minibatch. For each sampled point x𝑖 in the mini-batch, calculate F∗𝑖 = 𝑓(x𝑖; 𝜃𝑥) by forward propagation. Calculate the derivative according to (3). Update the parameter 𝜃𝑥 by using back propagation. end for for 𝑖𝑡𝑒𝑟 = 1, 2, ⋅⋅⋅ , 𝑡𝑦 do Randomly sample 𝑁𝑦 points from Y to construct a minibatch. For each sampled point y𝑗 in the mini-batch, calculate G∗𝑗 = 𝑔(y𝑗 ; 𝜃𝑦) by forward propagation. Calculate the derivative according to (4). Update the parameter 𝜃𝑦 by using back propagation. end for Learn B according to (5). until a fixed number of iterations 4. Experiment We carry out experiments on image-text datasets to verify the effectiveness of DCMH. DCMH is implemented with the open source deep learning toolbox MatConvNet [31] on a NVIDIA K80 GPU server. 4.1. Datasets Three datasets, MIRFLICKR-25K [12], IAPR TC-12 [8] and NUS-WIDE [6], are used for evaluation. The original MIRFLICKR-25K dataset [12] consists of 25,000 images collected from Flickr website. Each image is associated with several textual tags. Hence, each point is a image-text pair. We select those points which have at least 20 textual tags for our experiment. The text for each point is represented as a 1386-dimensional bag-of-words vector. For the hand-crafted feature based method, each image is represented by a 512-dimensional GIST feature vector. Furthermore, each point is manually annotated with one of the 24 unique labels. The IAPR TC-12 dataset [8] consists of 20,000 imagetext pairs which are annotated using 255 labels. We use the entire dataset for our experiment. The text for each point is represented as a 2912-dimensional bag-of-words vector. For the hand-crafted feature based method, each image is represented by a 512-dimensional GIST feature vector

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Deep Cross-Modal Hashing