Soft Constraints As mentioned in (Len_中国高校课件下载中心

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Column Sampling based Discrete Supervised Hashing

正在加载图片...

Soft Constraints g is very small,e.g.,less than 64.Hence,our method is As mentioned in (Leng et al.2014),when we can only scalable. get a subset of the semantic information,pushing two dissimilar points to have maximum Hamming distance may Experiment lead to over-fitting and unexpected result.Moreover,the We use real datasets to evaluate the effectiveness of our number of dissimilar labels is typically far more than that method.All the experiments are conducted on a workstation of similar labels.Hence,we can also view it as a class- with 6 Intel Xeon CPU cores and 48GB RAM. imbalance problem between positive and negative labels Inspired by (Leng et al.2014),we change the element-1 Dataset in our similarity matrix S to a real value 0<B<1.More Two image datasets with semantic labels are used to specifically,we take B=the number of 1 in s the number of-1 ins empirically. evaluate our method and the other baselines.They are Please note that although soft constraints can further im- CIFAR-10 (Krizhevsky 2009)and NUS-WIDE (Chua prove performance,the superior performance of our method et al.2009).Both of them have been widely mainly comes from the learning procedure rather than the used for hashing evaluation (Lin et al.2013a; soft constraints.Empirical verification about this can be Zhang et al.2014).Each instance in CIFAR-10 has a found in the supplementary material. single label,and each instance in NUS-WIDE might have multi-labels. Out-of-Sample Extension CIFAR-10 contains 60,000 images.Each image is repre- sented by a 512-dimension GIST feature vector extracted Many supervised hashing methods can be viewed as two- from the original color image of size 32 x 32.Each image step methods (Lin et al.2013a):learn binary code in the first is manually labeled to be one of the ten classes.Two images step,and then train g binary classifiers based on the feature are considered to be semantically similar if they share the matrix X and the learned code matrix B in the second step same class label.Otherwise,they are treated as semantically with each bit corresponding to one classifier(Zhang et al. dissimilar. 2014;Lin et al.2013a;2014;Xia et al.2014).Besides NUS-WIDE includes 269,648 images crawled from those methods using linear classifiers(Wang,Kumar,and Chang 2010a;Zhang et al.2014),some other methods use Flickr with 81 ground-truth labels (tags).Each image is represented by a 1134-dimension feature vector after more powerful nonlinear classifiers,such as SVM with RBF feature extraction.Each image might be associated with kernel (Lin et al.2013a),deep convolutional network (Xia multi-labels.There also exist some images without any et al.2014)and boosted decision trees(Lin et al.2014)and label,which are not suitable for our evaluation.After so on.In general,the more powerful classifiers we use for removing those images without any label,we get 209,347 out-of-sample extension,the better accuracy we can achieve images for our experiment.We consider two images to be and also the more training time will be consumed (Lin et semantically similar if they share at least one common label al.2013a;2014).FastH (Lin et al.2014)adopts an efficient Otherwise,they are semantically dissimilar. implementation of boosted decision trees for out-of-sample For all the datasets,we perform normalization on feature extension,which shows better accuracy and less training vectors to make each dimension have zero mean and equal time than other methods like KSH and TSH with nonlinear variance. classifiers Our COSDISH is also a two-step method.For out-of- Experimental Settings and Baselines sample extension,COSDISH chooses linear classifier and boosted decision trees in FastH to get two different variants As in LFH(Zhang et al.2014),for all datasets we randomly We will empirically evaluate these two variants in our choose 1000 points as validation set and 1000 points as experiments. query (test)set,with the rest of the points as training set.All experimental results are the average values of 10 Complexity Analysis independent random partitions. Unless otherwise stated,COSDISH refers to the variant The time complexity to construct problem (4)is O(Tx with soft constraints because in most cases it will out- +2).Both the time complexity to construct prob- perform the variant without soft constraints (refer to the lem (5)and that for problem (7)are O(2).Performing supplementary material).We use COSDISH to denote our Cholesky decomposition on AI-Q(k)need O(3).The method with linear classifier for out-of-sample extension. 2-approximation algorithm to solve the clustering problem and COSDISH BT to denote our method with boosted need O(3+2 log )For the BS-subproblem,we decision trees for out-of-sample extension. need to solve g BOP problems.Hence,the complexity of Because existing methods (Lin et al.2013a:Zhang et al. BS-subproblem is O(gx (Tx+3)).In addition, 2014)have shown that supervised methods can outperform the time complexity of Bl-subproblem is O(g x Tx). unsupervised methods,we only compare our method with Therefore,the total time complexity is O(Tsto Talt x qx some representative supervised hashing methods,including (Tx3)),and the space complexity is (T SPLH (Wang,Kumar,and Chang 2010b),KSH (Liu et al. +2).If we take q,then time complexity is 2012),TSH (Lin et al.2013a),LFH (Zhang et al.2014), O(Tsto x Tait x(nq2+q)),which is linear to n.Typically, FastH (Lin et al.2014)and SDH(Shen et al.2015).Soft Constraints As mentioned in (Leng et al. 2014), when we can only get a subset of the semantic information, pushing two dissimilar points to have maximum Hamming distance may lead to over-fitting and unexpected result. Moreover, the number of dissimilar labels is typically far more than that of similar labels. Hence, we can also view it as a classimbalance problem between positive and negative labels. Inspired by (Leng et al. 2014), we change the element -1 in our similarity matrix Se to a real value 0 < β < 1. More specifically, we take β = the number of 1 in Se the number of −1 in Se empirically. Please note that although soft constraints can further improve performance, the superior performance of our method mainly comes from the learning procedure rather than the soft constraints. Empirical verification about this can be found in the supplementary material. Out-of-Sample Extension Many supervised hashing methods can be viewed as twostep methods (Lin et al. 2013a): learn binary code in the first step, and then train q binary classifiers based on the feature matrix X and the learned code matrix B in the second step with each bit corresponding to one classifier (Zhang et al. 2014; Lin et al. 2013a; 2014; Xia et al. 2014). Besides those methods using linear classifiers (Wang, Kumar, and Chang 2010a; Zhang et al. 2014), some other methods use more powerful nonlinear classifiers, such as SVM with RBF kernel (Lin et al. 2013a), deep convolutional network (Xia et al. 2014) and boosted decision trees (Lin et al. 2014) and so on. In general, the more powerful classifiers we use for out-of-sample extension, the better accuracy we can achieve and also the more training time will be consumed (Lin et al. 2013a; 2014). FastH (Lin et al. 2014) adopts an efficient implementation of boosted decision trees for out-of-sample extension, which shows better accuracy and less training time than other methods like KSH and TSH with nonlinear classifiers. Our COSDISH is also a two-step method. For out-ofsample extension, COSDISH chooses linear classifier and boosted decision trees in FastH to get two different variants. We will empirically evaluate these two variants in our experiments. Complexity Analysis The time complexity to construct problem (4) is O(|Γ| × |Ω| + |Ω| 2 ). Both the time complexity to construct problem (5) and that for problem (7) are O(|Ω| 2 ). Performing Cholesky decomposition on λI − Qe (k) need O(|Ω| 3 ). The 2-approximation algorithm to solve the clustering problem need O(|Ω| 3 + |Ω| 2 log |Ω|). For the BΩ-subproblem, we need to solve q BQP problems. Hence, the complexity of BΩ-subproblem is O(q × (|Γ| × |Ω| + |Ω| 3 )). In addition, the time complexity of BΓ-subproblem is O(q × |Γ| × |Ω|). Therefore, the total time complexity is O(Tsto × Talt × q × (|Γ| × |Ω| + |Ω| 3 )), and the space complexity is O(|Γ| × |Ω| + |Ω| 2 ). If we take |Ω| = q, then time complexity is O(Tsto ×Talt ×(nq2 +q 4 )), which is linear to n. Typically, q is very small, e.g., less than 64. Hence, our method is scalable. Experiment We use real datasets to evaluate the effectiveness of our method. All the experiments are conducted on a workstation with 6 Intel Xeon CPU cores and 48GB RAM. Dataset Two image datasets with semantic labels are used to evaluate our method and the other baselines. They are CIFAR-10 (Krizhevsky 2009) and NUS-WIDE (Chua et al. 2009). Both of them have been widely used for hashing evaluation (Lin et al. 2013a; Zhang et al. 2014). Each instance in CIFAR-10 has a single label, and each instance in NUS-WIDE might have multi-labels. CIFAR-10 contains 60,000 images. Each image is represented by a 512-dimension GIST feature vector extracted from the original color image of size 32 × 32. Each image is manually labeled to be one of the ten classes. Two images are considered to be semantically similar if they share the same class label. Otherwise, they are treated as semantically dissimilar. NUS-WIDE includes 269,648 images crawled from Flickr with 81 ground-truth labels (tags). Each image is represented by a 1134-dimension feature vector after feature extraction. Each image might be associated with multi-labels. There also exist some images without any label, which are not suitable for our evaluation. After removing those images without any label, we get 209,347 images for our experiment. We consider two images to be semantically similar if they share at least one common label. Otherwise, they are semantically dissimilar. For all the datasets, we perform normalization on feature vectors to make each dimension have zero mean and equal variance. Experimental Settings and Baselines As in LFH (Zhang et al. 2014), for all datasets we randomly choose 1000 points as validation set and 1000 points as query (test) set, with the rest of the points as training set. All experimental results are the average values of 10 independent random partitions. Unless otherwise stated, COSDISH refers to the variant with soft constraints because in most cases it will outperform the variant without soft constraints (refer to the supplementary material). We use COSDISH to denote our method with linear classifier for out-of-sample extension, and COSDISH BT to denote our method with boosted decision trees for out-of-sample extension. Because existing methods (Lin et al. 2013a; Zhang et al. 2014) have shown that supervised methods can outperform unsupervised methods, we only compare our method with some representative supervised hashing methods, including SPLH (Wang, Kumar, and Chang 2010b), KSH (Liu et al. 2012), TSH (Lin et al. 2013a), LFH (Zhang et al. 2014), FastH (Lin et al. 2014) and SDH (Shen et al. 2015)

<<向上翻页向下翻页>>

点击下载：《人工智能、机器学习与大数据》课程教学资源（参考文献）Column Sampling based Discrete Supervised Hashing