Consequently, since Q is bounded by a_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Supervised Hashing with Latent Factor Models

正在加载图片...

Consequently,since Q is bounded by a small constant best hyper-parameters are selected by using the validation we can say that the cost in computation and storage of our set of CIFAR-10.All experimental results are averaged over learning algorithm is linear to the number of training points 10 independent rounds of random training/validation/query N.This makes our LFH easily scalable to very large dataset- partitions. S. Unless otherwise stated,we refer LFH in the experiment 2.6 Normalized Hyper-parameters section to the LFH with stochastic learning.We compare our LFH method with some state-of-the-art hashing meth- The hyper-parameter B in the objective function defined ods.which include: in (2)acts as a factor weighing the relative importance be- tween the first and the second term.However.the number Data-independent methods:locality-sensitive hashing of sum items in each term is different:in the first term there (LSH),and shift-invariant kernels hashing (SIKH); are S sum items,while in the second term there are N sum items.Since different datasets may have different values of Unsupervised data-dependent methods:spectral hash- N and ISI,the optimal value of B may vary between the ing (SH),principal component analysis based hash- datasets.To address this issue and make our method less ing (PCAH),iterative quantization(ITQ),anchor graph dependent on a specific dataset,we introduce a new hyper- hashing (AGH): parameter B'satisfying that: Supervised data-dependent methods:sequential pro- B'= jection learning for hashing(SPLH),minimal loss hash- ing(MLH),and kernel-based supervised hashing(KSH) By replacing B with B in(2),we have a specialized param- All the baseline methods are implemented using the source eter B'for each dataset.The optimal value for B is then codes provided by the corresponding authors.For KSH and normalized to roughly the same range on different datasets AGH,the number of support points for kernel construction We can normalize Ae in (5)by following the same idea. is set to 300 by following the same settings in [17,18].For We find that the MLH method spends most of the time KSH,SPLH,and MLH,it's impossible to use all the super- on selecting the best hyper-parameters for each dataset. vised information for training since it would be very time- With the normalized hyper-parameters introduced,we can consuming.Following the same strategy used in [17,we pre-compute the optimal values for the hyper-parameters sample 2,000 labeled points for these methods. on some smaller dataset,and then apply the same values All our experiments are conducted on a workstation with to all other datasets.This saves us the time of hyper- 24 Intel Xeon CPU cores and 64 GB RAM. parameter selection and makes our method more efficient on large datasets. 3.3 Effect of Stochastic Learning The convergence curve of the objective function on a sam- 3. EXPERIMENT pled CIFAR-10 subset of 5000 points with code length 32 is shown in Figure 2.The LFH-Full method refers to the LFH 3.1 Datasets that uses the full supervised information for updating,and We evaluate our method on two standard large image LFH-Stochastic refers to the LFH with stochastic learning. datasets with semantic labels:CIFAR-10 [11]and NUS- The objective function value is computed based on the full WIDE 3. supervised information for both methods.We can see that The CIFAR-10 dataset [11]consists of 60,000 color images the objective function of LFH-Full converges to a station- drawn from the 80M tiny image collection 29.Each image ary point after a few iterations.The objective function of of size 32 x 32 is represented by a 512-dimensional GIST LFH-Stochastic has a major trend of convergence to some s- feature vector.Each image is manually classified into one tationary point with slight vibration.This behavior is quite of the 10 classes.with an exclusive label indicating its be- similar to stochastic gradient descent method and is empir- longing class.Two images are considered as a ground truth ically acceptable. neighbor if they have the same class label. Figure 3 shows the mean average precision (MAP)[10,17, The NUS-WIDE dataset [3]contains 269,648 images col- 19]values computed on a validation set during the updating lected from Flickr.Each image is represented by a 1134- process.The final MAP evaluated on a query set is 0.5237 dimensional low level feature vector,including color his- for LFH-Full and 0.4694 for LFH-Stochastic.The reduction togram,color auto-correlogram,edge direction histogram. in MAP of LFH-Stochastic is affordable given the dramatic wavelet texture,block-wise color moments,and bag of visu- decrease in time complexity by using stochastic learning. al words.The images are manually assigned with some of the 81 concept tags.The ground truth neighbor is defined 3.4 Hamming Ranking Performance on two images if they share at least one common tag. We perform Hamming ranking using the generated bina- For data pre-processing,we follow the standard way of ry codes on the CIFAR-10 and NUS-WIDE datasets.For feature normalization by making each dimension of the fea- each query in the query set,all the points in the training ture vectors to have zero mean and equal variance. set are ranked according to the Hamming distance between their binary codes and the query's.The MAP is reported to 3.2 Experimental Settings and Baselines evaluate the accuracy of different hashing methods. For both CIFAR-10 and NUS-WIDE datasets,we random- Figure 4(a)and Figure 4(b)show the averaged MAP re- ly sample 1,000 points as query set,1,000 points as valida- sults with different code lengths on the two datasets,re- tion set,and all the remaining points as training set.Using spectively.We can find that with the help of semantic infor- normalized hyper-parameters described in Section 2.6,the mation,supervised data-dependent methods can generallyConsequently, since Q is bounded by a small constant, we can say that the cost in computation and storage of our learning algorithm is linear to the number of training points N. This makes our LFH easily scalable to very large datasets. 2.6 Normalized Hyper-parameters The hyper-parameter β in the objective function defined in (2) acts as a factor weighing the relative importance between the first and the second term. However, the number of sum items in each term is different: in the first term there are |S| sum items, while in the second term there are N sum items. Since different datasets may have different values of N and |S|, the optimal value of β may vary between the datasets. To address this issue and make our method less dependent on a specific dataset, we introduce a new hyperparameter β 0 satisfying that: β 0 = N |S|β. By replacing β with β 0 in (2), we have a specialized parameter β 0 for each dataset. The optimal value for β is then normalized to roughly the same range on different datasets. We can normalize λe in (5) by following the same idea. We find that the MLH method spends most of the time on selecting the best hyper-parameters for each dataset. With the normalized hyper-parameters introduced, we can pre-compute the optimal values for the hyper-parameters on some smaller dataset, and then apply the same values to all other datasets. This saves us the time of hyperparameter selection and makes our method more efficient on large datasets. 3. EXPERIMENT 3.1 Datasets We evaluate our method on two standard large image datasets with semantic labels: CIFAR-10 [11] and NUSWIDE [3]. The CIFAR-10 dataset [11] consists of 60,000 color images drawn from the 80M tiny image collection [29]. Each image of size 32 × 32 is represented by a 512-dimensional GIST feature vector. Each image is manually classified into one of the 10 classes, with an exclusive label indicating its belonging class. Two images are considered as a ground truth neighbor if they have the same class label. The NUS-WIDE dataset [3] contains 269,648 images collected from Flickr. Each image is represented by a 1134- dimensional low level feature vector, including color histogram, color auto-correlogram, edge direction histogram, wavelet texture, block-wise color moments, and bag of visual words. The images are manually assigned with some of the 81 concept tags. The ground truth neighbor is defined on two images if they share at least one common tag. For data pre-processing, we follow the standard way of feature normalization by making each dimension of the feature vectors to have zero mean and equal variance. 3.2 Experimental Settings and Baselines For both CIFAR-10 and NUS-WIDE datasets, we randomly sample 1,000 points as query set, 1,000 points as validation set, and all the remaining points as training set. Using normalized hyper-parameters described in Section 2.6, the best hyper-parameters are selected by using the validation set of CIFAR-10. All experimental results are averaged over 10 independent rounds of random training/validation/query partitions. Unless otherwise stated, we refer LFH in the experiment section to the LFH with stochastic learning. We compare our LFH method with some state-of-the-art hashing methods, which include: • Data-independent methods: locality-sensitive hashing (LSH), and shift-invariant kernels hashing (SIKH); • Unsupervised data-dependent methods: spectral hashing (SH), principal component analysis based hashing (PCAH), iterative quantization (ITQ), anchor graph hashing (AGH); • Supervised data-dependent methods: sequential projection learning for hashing (SPLH), minimal loss hashing (MLH), and kernel-based supervised hashing (KSH). All the baseline methods are implemented using the source codes provided by the corresponding authors. For KSH and AGH, the number of support points for kernel construction is set to 300 by following the same settings in [17, 18]. For KSH, SPLH, and MLH, it’s impossible to use all the supervised information for training since it would be very timeconsuming. Following the same strategy used in [17], we sample 2,000 labeled points for these methods. All our experiments are conducted on a workstation with 24 Intel Xeon CPU cores and 64 GB RAM. 3.3 Effect of Stochastic Learning The convergence curve of the objective function on a sampled CIFAR-10 subset of 5000 points with code length 32 is shown in Figure 2. The LFH-Full method refers to the LFH that uses the full supervised information for updating, and LFH-Stochastic refers to the LFH with stochastic learning. The objective function value is computed based on the full supervised information for both methods. We can see that the objective function of LFH-Full converges to a stationary point after a few iterations. The objective function of LFH-Stochastic has a major trend of convergence to some stationary point with slight vibration. This behavior is quite similar to stochastic gradient descent method and is empirically acceptable. Figure 3 shows the mean average precision (MAP) [10, 17, 19] values computed on a validation set during the updating process. The final MAP evaluated on a query set is 0.5237 for LFH-Full and 0.4694 for LFH-Stochastic. The reduction in MAP of LFH-Stochastic is affordable given the dramatic decrease in time complexity by using stochastic learning. 3.4 Hamming Ranking Performance We perform Hamming ranking using the generated binary codes on the CIFAR-10 and NUS-WIDE datasets. For each query in the query set, all the points in the training set are ranked according to the Hamming distance between their binary codes and the query’s. The MAP is reported to evaluate the accuracy of different hashing methods. Figure 4(a) and Figure 4(b) show the averaged MAP results with different code lengths on the two datasets, respectively. We can find that with the help of semantic information, supervised data-dependent methods can generally

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Supervised Hashing with Latent Factor Models