正在加载图片...
JIANG et al.:DEEP DISCRETE SUPERVISED HASHING 6001 discrete coding procedure.The objective function of DPSH! The SVHN dataset consists of 73,257 digits for training, can be written as: 26,032 digits for testing and 531,131 additional samples.It is a real-world image dataset for recognizing digital numbers CDsH=-∑(SOi-log(1+ej)+n∑Ib:-u2, in natural scene images.The images are categorized into SijES i=l 10 classes,each corresponding to a digital number.SVHN is where ij=ufuj and ui denotes the output of the deep also a single-label dataset.Two images are treated as similar if they share the same label.Otherwise,they are considered neural network.We can find that in DPSH the discrete coding to be dissimilar. procedure is not directly guided by supervised information, i.e.,the supervised information is not directly included in the The NUS-WIDE dataset is a large-scale image dataset which includes 269,648 images and the associated tags from terms of [bi}in the objective function.The objective function Flickr website.It's a multi-label dataset where each image of DQN can be written as: might be annotated with multi labels.We select 186,577 data cpoN=∑(S-2 2∑Iz-Ch, points that belong to the 10 most frequent concepts from the original dataset.Two images are treated as similar if they share at least one label.Otherwise,they are considered to be where zi denotes the output of the deep neural network and dissimilar. ∑=,l☑-Chil denotes the product quantization loss. The ClothinglM dataset is a relatively large-scale The discrete coding procedure is only contained in the term dataset which contains 1,037,497 images which belong to lzi-Chill.We can find that in DON the discrete cod- l4 classes including“T-shirt'",“shirt'”",“knitwear'”,“chiffon"", ing procedure is not directly guided by supervised information “sweater'”,"hoodie'”,“windbreaker'”,"jacket'",“downcoat'" either. “suit”,“shawl",“dress'”,“vest”and“underwear'”.Clothing1M There appears one deep hashing method called DSDH [44]. dataset is a single-label dataset.Two images are treated as Unlike DDSH,DSDH utilizes pointwise supervised infor- similar if they share the same label,i.e.,they belong to the mation to guide the discrete coding procedure and utilize same class.Otherwise,they are considered to be dissimilar. pairwise similarity to guide feature learning procedure.Then Table IV illustrates some example points from the above DSDH bridges discrete coding procedure and feature learning four datasets. procedure by using the method of auxiliary coordinates (MAC) For CIFAR-10 dataset,we randomly take technique in AFFHash [49].Due to the requirements of 1,000 images (100 images per class)as query set and the pointwise labels and pairwise labels,the application scenarios remaining images as retrieval set.Furthermore,we randomly of DSDH might be limited. select 5,000 images (500 images per class)from retrieval To the best of our knowledge,our DDSH is the first set as training set.For SVHN dataset,we randomly select deep hashing method which can utilize pairwise supervised 1,000 images (100 images per class)from testing set as information to directly guide both discrete coding procedure query set and utilize the whole training set as retrieval and deep feature learning procedure in the same framework. set.We randomly select 5,000 images (500 images per class)from retrieval set as training set.For NUS-WIDE V.EXPERIMENT dataset,we randomly select 1,867 data points as query We evaluate DDSH and other baselines on datasets from set and the remaining data points as retrieval set. image retrieval applications.The open source deep learning We randomly select 5,000 data points from retrieval set library MatConvNet [50]is used to implement our model.All as training set.For ClothingIM dataset,after removing experiments are performed on an NVIDIA K40 GPU server. the images whose links are invalid,we randomly select 7,000 images (500 images per class)as query set and 1,028,083 images as retrieval set.Furthermore,we randomly A.Experimental Setting sample 14.000 images(1,000 images per class)from retrieval 1)Datasets:We adopt four widely used image datasets set to construct training set. to evaluate our proposed method.They are CIFAR-102 [45], 2)Baselines and Evaluation Protocol:We compare DDSH SVHN3 [51],NUS-WIDE [52]and Clothing1M>[53]. with eleven state-of-the-art baselines,including LSH [19], The CIFAR-10 dataset contains 60,000 images which are ITQ [20],LFH [33],FastH [23],SDH [25],COSDISH [34], manually labeled into 10 classes including“airplane'”,“auto- NDH [17].DHN [43],DSH [40].DPSH [42]and DSDH [44]. mobile'”,bird,“cat”,“deer”,“dog”,“frog”,horse”,“ship These baselines are briefly introduced as follows: and"truck".It's a single-label dataset.The size of each image is 32x32 pixels.Two images are treated as similar if they share Locality-sensitive hashing (LSH)[19]:LSH is a repre- the same label,i.e.,they belong to the same class.Otherwise, sentative data-independent hashing method.LSH utilizes they are considered to be dissimilar. random projection to generate hash function. Iterative quantization (ITQ)[20]:ITQ is a representative For DPSH.supervised information Sij is defined on (0,1). 2https://www.cs.toronto.edu/~kriz/cifar.html unsupervised hashing method.ITQ first projects data 3http://ufldl.stanford.edu/housenumbers/ points into low space by utilizing principal component 4http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm analysis (PCA).Then ITQ minimizes the quantization 5https://github.com/Cysu/noisy_label error to learn binary code.JIANG et al.: DEEP DISCRETE SUPERVISED HASHING 6001 discrete coding procedure. The objective function of DPSH1 can be written as: LDPSH = − Sij∈S (Sij ij − log(1 + eij)) + η n i=1 bi − ui 2 F , where ij = 1 2uT i uj and ui denotes the output of the deep neural network. We can find that in DPSH the discrete coding procedure is not directly guided by supervised information, i.e., the supervised information is not directly included in the terms of {bi} in the objective function. The objective function of DQN can be written as: LDQN = Sij ∈S (Sij − zT i z j ziz j ) 2 + λ n i=1 zi − Chi2 F ,  where zi denotes the output of the deep neural network and n i=1 zi − Chi2 F denotes the product quantization loss. The discrete coding procedure is only contained in the term n i=1 zi −Chi2 F . We can find that in DQN the discrete cod￾ing procedure is not directly guided by supervised information either. There appears one deep hashing method called DSDH [44]. Unlike DDSH, DSDH utilizes pointwise supervised infor￾mation to guide the discrete coding procedure and utilize pairwise similarity to guide feature learning procedure. Then DSDH bridges discrete coding procedure and feature learning procedure by using the method of auxiliary coordinates (MAC) technique in AFFHash [49]. Due to the requirements of pointwise labels and pairwise labels, the application scenarios of DSDH might be limited. To the best of our knowledge, our DDSH is the first deep hashing method which can utilize pairwise supervised information to directly guide both discrete coding procedure and deep feature learning procedure in the same framework. V. EXPERIMENT We evaluate DDSH and other baselines on datasets from image retrieval applications. The open source deep learning library MatConvNet [50] is used to implement our model. All experiments are performed on an NVIDIA K40 GPU server. A. Experimental Setting 1) Datasets: We adopt four widely used image datasets to evaluate our proposed method. They are CIFAR-102 [45], SVHN3 [51], NUS-WIDE4 [52] and Clothing1M5 [53]. The CIFAR-10 dataset contains 60,000 images which are manually labeled into 10 classes including “airplane”, “auto￾mobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”. It’s a single-label dataset. The size of each image is 32×32 pixels. Two images are treated as similar if they share the same label, i.e., they belong to the same class. Otherwise, they are considered to be dissimilar. 1For DPSH, supervised information Si j is defined on {0, 1}. 2https://www.cs.toronto.edu/∼ kriz/cifar.html 3http://ufldl.stanford.edu/housenumbers/ 4http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm 5https://github.com/Cysu/noisy_label The SVHN dataset consists of 73,257 digits for training, 26,032 digits for testing and 531,131 additional samples. It is a real-world image dataset for recognizing digital numbers in natural scene images. The images are categorized into 10 classes, each corresponding to a digital number. SVHN is also a single-label dataset. Two images are treated as similar if they share the same label. Otherwise, they are considered to be dissimilar. The NUS-WIDE dataset is a large-scale image dataset which includes 269,648 images and the associated tags from Flickr website. It’s a multi-label dataset where each image might be annotated with multi labels. We select 186,577 data points that belong to the 10 most frequent concepts from the original dataset. Two images are treated as similar if they share at least one label. Otherwise, they are considered to be dissimilar. The Clothing1M dataset is a relatively large-scale dataset which contains 1,037,497 images which belong to 14 classes including “T-shirt”, “shirt”, “knitwear”, “chiffon”, “sweater”, “hoodie”, “windbreaker”, “jacket”, “downcoat”, “suit”, “shawl”, “dress”, “vest” and “underwear”. Clothing1M dataset is a single-label dataset. Two images are treated as similar if they share the same label, i.e., they belong to the same class. Otherwise, they are considered to be dissimilar. Table IV illustrates some example points from the above four datasets. For CIFAR-10 dataset, we randomly take 1,000 images (100 images per class) as query set and the remaining images as retrieval set. Furthermore, we randomly select 5,000 images (500 images per class) from retrieval set as training set. For SVHN dataset, we randomly select 1,000 images (100 images per class) from testing set as query set and utilize the whole training set as retrieval set. We randomly select 5,000 images (500 images per class) from retrieval set as training set. For NUS-WIDE dataset, we randomly select 1,867 data points as query set and the remaining data points as retrieval set. We randomly select 5,000 data points from retrieval set as training set. For Clothing1M dataset, after removing the images whose links are invalid, we randomly select 7,000 images (500 images per class) as query set and 1,028,083 images as retrieval set. Furthermore, we randomly sample 14,000 images (1,000 images per class) from retrieval set to construct training set. 2) Baselines and Evaluation Protocol: We compare DDSH with eleven state-of-the-art baselines, including LSH [19], ITQ [20], LFH [33], FastH [23], SDH [25], COSDISH [34], NDH [17], DHN [43], DSH [40], DPSH [42] and DSDH [44]. These baselines are briefly introduced as follows: • Locality-sensitive hashing (LSH) [19]: LSH is a repre￾sentative data-independent hashing method. LSH utilizes random projection to generate hash function. • Iterative quantization (ITQ) [20]: ITQ is a representative unsupervised hashing method. ITQ first projects data points into low space by utilizing principal component analysis (PCA). Then ITQ minimizes the quantization error to learn binary code.
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有