正在加载图片...
pling (Zhang et al.2014)is adopted to sample several by different optimization problems,the commonly used one columns from the semantic similarity matrix.Different is as follows: from traditional sampling methods which try to sample only a small subset of the entire dataset for training,our min llgS BBT (1) Be{-1,1}n×g column sampling method can exploit all the available data points for training. where s-BBI=∑∑=1(qS-B,Bg2 Based on the sampled columns,the hashing-code learning The main idea of (1)is to adopt the inner product, which reflects the opposite of the Hamming distance,of procedure can be decomposed into two parts which can two binary codes to approximate the similarity label with be alternately optimized in a discrete way.The discrete the square loss.This model has been widely used in many optimization strategy can avoid the errors caused by supervised hashing methods (Liu et al.2012;Lin et al. relaxation in traditional continuous optimization methods 2013a;Zhang and Li 2014;Lin et al.2014;Xia et al.2014; Theoretical analysis shows that the learning (optimiza- Leng et al.2014).LFH (Zhang et al.2014)also uses the tion)algorithm has a constant-approximation bound in inner product to approximate the similarity label,but it uses each step of the alternating optimization procedure. the logistic loss rather than the square loss in (1). Problem(1)is a discrete optimization problem which is Empirical results on datasets with semantic labels illus- trate that COSDISH can outperform the state-of-the-art hard to solve.Most existing methods optimize it by dropping the discrete constraint (Liu et al.2012;Lin et al.2013a). methods in real applications,such as image retrieval. To the best of our knowledge,only one method,called FastH(Lin et al.2014),has been proposed to directly solve Notation and Problem Definition the discrete optimization problem in(1).Due to the difficulty Notation of discrete optimization,FastH adopts a bit-wise learning strategy which uses a Block Graph-Cut method to get the Boldface lowercase letters like a denote vectors.and the ith local optima and learn one bit at a time.The experiments element of a is denoted as a;.Boldface uppercase letters of FastH show that FastH can achieve better accuracy than like A denote matrices.I denotes the identity matrix.Ai other supervised methods with continuous relaxation. and A.denote the ith row and the jth column of A, It is easy to see that both time complexity and storage respectively.Aii denotes the element at the ith row and complexity are O(n2)if all the supervised information in S jth column in A.A-1 denotes the inverse of A,and AT is used for training.Hence,all the existing methods,includ- denotes the transpose of A.denotes the cardinality of a ing relaxation-based continuous optimization methods and set,i.e.,the number of elements in the set.lg denotes the discrete optimization methods,sample only a small subset Frobenius norm of a vector or matrix,and denotes the with m(m n)points for training where m is typically LI norm of a vector or matrix.sgn()is the element-wise several thousand even if we are given a large-scale training sign function which returns 1 if the element is a positive set.Because some training points are discarded,all these number and returns-1 otherwise. existing methods cannot achieve satisfactory accuracy. Therefore,to get satisfactory accuracy,we need to solve Problem Definition the problem in (1)from two aspects.On one hand,we Suppose we have n points {xiE R)1 where xi is the need to adopt proper sampling strategy to effectively exploit of thepn all the n available points for training.On the other hand, we need to design strategies for discrete optimization.This where.Besides the feature vectors,the training motivates the work in this paper. set of supervised hashing also contains a semantic similarity matrix S E{-1,0,1)nxn,where Sij 1 means that COSDISH point i and point j are semantically similar,Sii=-1 This section presents the details of our proposed method means that point i and point j are semantically dissimilar, called COSDISH.More specifically,we try to solve the two and S=0 means that whether point i and point j are aspects stated above,sampling and discrete optimization,for semantically similar or not is unknown.Here,the semantic the supervised hashing problem. information typically refers to semantic labels provided with manual effort.In this paper,we assume that S is fully Column Sampling observed without missing entries,i.e.,S {-1,1]nxn As stated above,both time complexity and storage com- This assumption is reasonable because in many cases we plexity are O(n2)if all the supervised information in S is can always get the semantic label information between two used for training.Hence,we have to perform sampling for points.Furthermore,our model in this paper can be easily training,which is actually adopted by almost all existing adapted to the cases with missing entries. methods,such as KSH,TSH,FastH and LFH.However. The goal of supervised hashing is to learn a binary all existing methods except LFH try to sample only a small code matrix B E {-1,1)nx4,where Bi.denotes the subset with m(m<n)points for training and discard the g-bit code for training point i.Furthermore,the learned rest training points,which leads to unsatisfactory accuracy. binary codes should preserve the semantic similarity in S. The special case is LFH,which proposes to sample sev- Although the goal of supervised hashing can be formulated eral columns from S in each iteration and several iterationspling (Zhang et al. 2014) is adopted to sample several columns from the semantic similarity matrix. Different from traditional sampling methods which try to sample only a small subset of the entire dataset for training, our column sampling method can exploit all the available data points for training. • Based on the sampled columns, the hashing-code learning procedure can be decomposed into two parts which can be alternately optimized in a discrete way. The discrete optimization strategy can avoid the errors caused by relaxation in traditional continuous optimization methods. • Theoretical analysis shows that the learning (optimiza￾tion) algorithm has a constant-approximation bound in each step of the alternating optimization procedure. • Empirical results on datasets with semantic labels illus￾trate that COSDISH can outperform the state-of-the-art methods in real applications, such as image retrieval. Notation and Problem Definition Notation Boldface lowercase letters like a denote vectors, and the ith element of a is denoted as ai . Boldface uppercase letters like A denote matrices. I denotes the identity matrix. Ai∗ and A∗j denote the ith row and the jth column of A, respectively. Aij denotes the element at the ith row and jth column in A. A−1 denotes the inverse of A, and AT denotes the transpose of A. | · | denotes the cardinality of a set, i.e., the number of elements in the set. k · kF denotes the Frobenius norm of a vector or matrix, and k · k1 denotes the L1 norm of a vector or matrix. sgn(·) is the element-wise sign function which returns 1 if the element is a positive number and returns -1 otherwise. Problem Definition Suppose we have n points {xi ∈ R d} n i=1 where xi is the feature vector of point i. We can denote the feature vectors of the n points in a compact matrix form X ∈ R n×d , where XT i∗ = xi . Besides the feature vectors, the training set of supervised hashing also contains a semantic similarity matrix S ∈ {−1, 0, 1} n×n, where Sij = 1 means that point i and point j are semantically similar, Sij = −1 means that point i and point j are semantically dissimilar, and Sij = 0 means that whether point i and point j are semantically similar or not is unknown. Here, the semantic information typically refers to semantic labels provided with manual effort. In this paper, we assume that S is fully observed without missing entries, i.e., S ∈ {−1, 1} n×n. This assumption is reasonable because in many cases we can always get the semantic label information between two points. Furthermore, our model in this paper can be easily adapted to the cases with missing entries. The goal of supervised hashing is to learn a binary code matrix B ∈ {−1, 1} n×q , where Bi∗ denotes the q-bit code for training point i. Furthermore, the learned binary codes should preserve the semantic similarity in S. Although the goal of supervised hashing can be formulated by different optimization problems, the commonly used one is as follows: min B∈{−1,1}n×q kqS − BBT k 2 F , (1) where kqS − BBT k 2 F = Pn i=1 Pn j=1(qSij − Bi∗BT j∗ ) 2 . The main idea of (1) is to adopt the inner product, which reflects the opposite of the Hamming distance, of two binary codes to approximate the similarity label with the square loss. This model has been widely used in many supervised hashing methods (Liu et al. 2012; Lin et al. 2013a; Zhang and Li 2014; Lin et al. 2014; Xia et al. 2014; Leng et al. 2014). LFH (Zhang et al. 2014) also uses the inner product to approximate the similarity label, but it uses the logistic loss rather than the square loss in (1). Problem (1) is a discrete optimization problem which is hard to solve. Most existing methods optimize it by dropping the discrete constraint (Liu et al. 2012; Lin et al. 2013a). To the best of our knowledge, only one method, called FastH (Lin et al. 2014), has been proposed to directly solve the discrete optimization problem in (1). Due to the difficulty of discrete optimization, FastH adopts a bit-wise learning strategy which uses a Block Graph-Cut method to get the local optima and learn one bit at a time. The experiments of FastH show that FastH can achieve better accuracy than other supervised methods with continuous relaxation. It is easy to see that both time complexity and storage complexity are O(n 2 ) if all the supervised information in S is used for training. Hence, all the existing methods, includ￾ing relaxation-based continuous optimization methods and discrete optimization methods, sample only a small subset with m (m < n) points for training where m is typically several thousand even if we are given a large-scale training set. Because some training points are discarded, all these existing methods cannot achieve satisfactory accuracy. Therefore, to get satisfactory accuracy, we need to solve the problem in (1) from two aspects. On one hand, we need to adopt proper sampling strategy to effectively exploit all the n available points for training. On the other hand, we need to design strategies for discrete optimization. This motivates the work in this paper. COSDISH This section presents the details of our proposed method called COSDISH. More specifically, we try to solve the two aspects stated above, sampling and discrete optimization, for the supervised hashing problem. Column Sampling As stated above, both time complexity and storage com￾plexity are O(n 2 ) if all the supervised information in S is used for training. Hence, we have to perform sampling for training, which is actually adopted by almost all existing methods, such as KSH, TSH, FastH and LFH. However, all existing methods except LFH try to sample only a small subset with m (m < n) points for training and discard the rest training points, which leads to unsatisfactory accuracy. The special case is LFH, which proposes to sample sev￾eral columns from S in each iteration and several iterations
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有