正在加载图片...
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence(IJCAI-17) Algorithm 2 Learning algorithm for BGDH [Lin et al.,20141.SDH [Shen et al.,2015]and COSDISH 1:Input:A bipartite graph G=(.O.)images [Kang et al.,2016];supervised deep hashing methods:DPSH {xi}是l,pairwise labels S={si},parametersn,入, [Li et al.,2016],DHN [Zhu et al.,2016]and DSH [Liu et al., batch iterations T1,T and sizes N,N2 2016];semi-supervised deep methods SSDH [Zhang et al., 2:Output:Binary codes B=(bi} 2016]. 3:Initialization:Initialize 0r with the pre-trained CNN-F For hashing methods using hand-crafted features,we rep- model on ImageNet;Initialize each entry of M,v,and e resent each image in CIFAR-10 by a 512-dimensional GIST by randomly sampling from a Gaussian distribution with vector.In NUS-WIDE,an image is represented by a 1134- mean 0 and variance 0.01. dimensional feature vector,including 500-D bag-of-words 4:REPEAT features,64-D color histogram,144-D color correlogram,73- 5:fort←1toT1do D edge direction histogram,128-D wavelet texture,225-D block-wise color moments.For deep methods,we use the 6: Randomly sample Ni labeled images,and let ZI be the set containing indexes of sampled instances raw image pixels as input.The network architectures of these 7: Calculate o(xi;0)and (ei:0e)for all i E I by deep models are different from each other,for fair compar- forward propagation ison,we adopt the same deep network architecture CNN-F 8: Compute ui M[o(xi;0);(e:;0e)]+v and the with the same initialization parameters pre-trained on Im- binary code of xi with bi=sgn(ui)for all i Z ageNet [Deng et al.,2009]for all deep hashing methods. 9: Compute the derivative of Cs w.r.t.fui:iZ1} The bipartite graph of BGDH is constructed based on hand- 10: Update parameters M,v,0,0e,and [e;:iE T1} crafted features with heat kemnel,where the hyper-parameter by back propagation p is set as 1 for CIFAR-10 and 10 for NUS-WIDE.The hyper- 11:end for parameter n in BGDH is set as 10 for CIFAR-10 and 100 for 12:fort←-1toT2do NUS-WIDE similar to DPSH [Li et al.,2016].We simply set 13: Randomly generate a batch of triples by invoking Al- T1 =10,T2 =5,0.1 in all the experiments. gorithm 1 N2 times,and let Z2 be the set containing in- 4.2 Experimental Results dexes of sampled instances and contexts 14: Compute the derivative of o w.r.t.e;,we:(i,c)E To evaluate the retrieval accuracy,we use the Mean Average Z Precision (MAP)as the evaluation metric.Following Lai et 15: Update parameters [ei,we:(i,c)EZ2) al.,2015;Xia et al.,2014],1000 images (100 images per 16:end for class)are randomly selected as the query set in CIFAR-10. 17:UNTIL stopping We use all the rest of images as training set for the unsu- 18:Calculate bi sgn (MT[o(xi;0);(e:;0e)]+v)for pervised methods and for building a neighbour graph g in i=1,...,n BGDH.We randomly select 5000(500 images per class)im- ages as the training set for supervised methods. In NUS-WIDE,we randomly sample 2100 query images 4 Experiments from 21 most frequent labels following the strategy in [Lai et al.,2015;Xia et al.,2014].We use all the rest of images as In this section,we present experimental results.All the ex- training set for the unsupervised methods.To reduce the time periments are performed on a NVIDIA K80 GPU server with complexity,we only use 5000 landmarks sampled randomly MatConvNet [Vedaldi and Lenc,2014]. to build a bipartite graph for BGDH.For supervised methods, we randomly select 10500 images from dataset.The MAP 4.1 Datasets and Setting values are calculated within the top 5000 returned neighbors We conduct experiments on two widely used benchmark We present the MAP results in Table 2 where BGDH.SS- datasets:CIFAR-10 and NUS-WIDE.The CIFAR-10 dataset DH,DSH,DHN,DPSH are deep methods.Except for SSDH consists of 60,000 images from 10 classes (6000 images per which uses triplet labels,all deep methods are trained with class).It is a single-label dataset.The NUS-WIDE dataset2 pairwise labels.To be fair,parameters of these methods are contains 269,648 images from Flickr.Following the settings set according to the suggestions of authors.The results show in [Lai et al.,2015],we use 19,5834 images belonging to the that in most cases our proposed BGDH method substantially 21 most frequent classes and each class consists at least 5000 outperforms other baselines including unsupervised methods, images.Additionally,two images are defined as a ground- supervised methods and semi-supervised methods truth neighbor when they share at least one common label. In real applications,the number of labeled instances is usu- In our experiments,BGDH-T denotes transductive mod- ally far less than that of unlabeled instances.To further verify el while BGDH-I denotes inductive one.We compare our the effectiveness of leveraging both pairwise labels and un- method with several state-of-the-art methods including unsu- label data.we reduce the number of labeled instances while pervised methods:ITQ [Gong et al.,2013],LSH [Gionis et other settings remain the same.For supervised learning.in al.,1999],IsoH [Kong and Li,2012]and SpH [Heo et al., CIFAR-10,we randomly select 2500(250 images per class) 2012];supervised methods:LFH [Zhang et al.,2014],FastH images and in NUS-WIDE,we randomly select 5000 images. The MAP results with this experiment setting are listed in Ta- https://www.cs.toronto.edu/kriz/cifar.html ble 3.Note that the results of unlisted unsupervised methods http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm are the same as those in Table 2.We observe that our BGDH 3242Algorithm 2 Learning algorithm for BGDH 1: Input: A bipartite graph G = (X , O, E), images X = {xi} n i=1, pairwise labels S = {sij}, parameters η, λ, batch iterations T1, T2 and sizes N1, N2 2: Output: Binary codes B = {bi} n i=1 3: Initialization: Initialize θx with the pre-trained CNN-F model on ImageNet; Initialize each entry of M, v, and θe by randomly sampling from a Gaussian distribution with mean 0 and variance 0.01. 4: REPEAT 5: for t ← 1 to T1 do 6: Randomly sample N1 labeled images, and let I1 be the set containing indexes of sampled instances 7: Calculate φ(xi ; θx) and ψ(ei ; θe) for all i ∈ I1 by forward propagation 8: Compute ui = MT [φ(xi ; θx); ψ(ei ; θe)] + v and the binary code of xi with bi = sgn(ui) for all i ∈ I1 9: Compute the derivative of Ls w. r. t. {ui : i ∈ I1} 10: Update parameters M, v, θx, θe, and {ei : i ∈ I1} by back propagation 11: end for 12: for t ← 1 to T2 do 13: Randomly generate a batch of triples by invoking Al￾gorithm 1 N2 times, and let I2 be the set containing in￾dexes of sampled instances and contexts 14: Compute the derivative of Lg w. r. t. {ei , wc : (i, c) ∈ I2} 15: Update parameters {ei , wc : (i, c) ∈ I2} 16: end for 17: UNTIL stopping 18: Calculate bi = sgn ￾MT [φ(xi ; θx); ψ(ei ; θe)] + v  for i = 1, . . . , n 4 Experiments In this section, we present experimental results. All the ex￾periments are performed on a NVIDIA K80 GPU server with MatConvNet [Vedaldi and Lenc, 2014]. 4.1 Datasets and Setting We conduct experiments on two widely used benchmark datasets: CIFAR-10 and NUS-WIDE. The CIFAR-10 dataset1 consists of 60, 000 images from 10 classes (6000 images per class). It is a single-label dataset. The NUS-WIDE dataset2 contains 269, 648 images from Flickr. Following the settings in [Lai et al., 2015], we use 19,5834 images belonging to the 21 most frequent classes and each class consists at least 5000 images. Additionally, two images are defined as a ground￾truth neighbor when they share at least one common label. In our experiments, BGDH-T denotes transductive mod￾el while BGDH-I denotes inductive one. We compare our method with several state-of-the-art methods including unsu￾pervised methods: ITQ [Gong et al., 2013], LSH [Gionis et al., 1999], IsoH [Kong and Li, 2012] and SpH [Heo et al., 2012]; supervised methods: LFH [Zhang et al., 2014], FastH 1 https://www.cs.toronto.edu/ kriz/cifar.html 2 http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm [Lin et al., 2014], SDH [Shen et al., 2015] and COSDISH [Kang et al., 2016]; supervised deep hashing methods: DPSH [Li et al., 2016], DHN [Zhu et al., 2016] and DSH [Liu et al., 2016]; semi-supervised deep methods SSDH [Zhang et al., 2016]. For hashing methods using hand-crafted features, we rep￾resent each image in CIFAR-10 by a 512-dimensional GIST vector. In NUS-WIDE, an image is represented by a 1134- dimensional feature vector, including 500-D bag-of-words features, 64-D color histogram, 144-D color correlogram, 73- D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments. For deep methods, we use the raw image pixels as input. The network architectures of these deep models are different from each other, for fair compar￾ison, we adopt the same deep network architecture CNN-F with the same initialization parameters pre-trained on Im￾ageNet [Deng et al., 2009] for all deep hashing methods. The bipartite graph of BGDH is constructed based on hand￾crafted features with heat kernel, where the hyper-parameter ρ is set as 1 for CIFAR-10 and 10 for NUS-WIDE. The hyper￾parameter η in BGDH is set as 10 for CIFAR-10 and 100 for NUS-WIDE similar to DPSH [Li et al., 2016]. We simply set T1 = 10, T2 = 5, λ = 0.1 in all the experiments. 4.2 Experimental Results To evaluate the retrieval accuracy, we use the Mean Average Precision (MAP) as the evaluation metric. Following [Lai et al., 2015; Xia et al., 2014], 1000 images (100 images per class) are randomly selected as the query set in CIFAR-10. We use all the rest of images as training set for the unsu￾pervised methods and for building a neighbour graph G in BGDH. We randomly select 5000 (500 images per class) im￾ages as the training set for supervised methods. In NUS-WIDE, we randomly sample 2100 query images from 21 most frequent labels following the strategy in [Lai et al., 2015; Xia et al., 2014]. We use all the rest of images as training set for the unsupervised methods. To reduce the time complexity, we only use 5000 landmarks sampled randomly to build a bipartite graph for BGDH. For supervised methods, we randomly select 10500 images from dataset. The MAP values are calculated within the top 5000 returned neighbors. We present the MAP results in Table 2 where BGDH, SS￾DH, DSH, DHN, DPSH are deep methods. Except for SSDH which uses triplet labels, all deep methods are trained with pairwise labels. To be fair, parameters of these methods are set according to the suggestions of authors. The results show that in most cases our proposed BGDH method substantially outperforms other baselines including unsupervised methods, supervised methods and semi-supervised methods. In real applications, the number of labeled instances is usu￾ally far less than that of unlabeled instances. To further verify the effectiveness of leveraging both pairwise labels and un￾label data, we reduce the number of labeled instances while other settings remain the same. For supervised learning, in CIFAR-10, we randomly select 2500 (250 images per class) images and in NUS-WIDE, we randomly select 5000 images. The MAP results with this experiment setting are listed in Ta￾ble 3. Note that the results of unlisted unsupervised methods are the same as those in Table 2. We observe that our BGDH Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) 3242
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有