正在加载图片...
techniques have become a promising choice for efficient ANN To model the large-scale problems,a linear-time vari- search on massive datasets. ant with stochastic learning is proposed for fast pa- Existing hashing methods can be divided into two cate- rameter learning. gories:data-independent methods and data-dependent meth- ods 6,17,18.For data-independent methods,the hash- Experimental results on two real datasets with seman- ing functions are learned without using any training data tic labels show that LFH can achieve much higher ac- Representative data-independent methods include locality- curacy than other state-of-the-art methods with effi- sensitive hashing (LSH)1,5,7,shift-invariant kernels hash- ciency in training time. ing (SIKH)22,and many other extensions [4,13,14.On the other hand,for data-dependent methods,their hashing The rest of this paper is organized as follows:In Section 2, functions are learned from some training data.Generally s- we will introduce the details of our LFH model.Experimen- peaking,data-dependent methods often require less number tal results are presented in Section 3.Finally,we conclude of bits than data-independent methods to achieve satisfac- the paper in Section 4. tory performance. The data-dependent methods can be further divided into 2. LATENT FACTOR HASHING two categories:unsupervised methods and supervised meth- ods [17,20,32].Unsupervised methods try to preserve In this section,we present the details of our latent factor the metric (Euclidean)structure between data points us- hashing (LFH)model,including the model formulation and ing only their feature information.Representative unsuper- learning algorithms. vised methods include spectral hashing (SH)34,principal 2.1 Problem Definition component analysis based hashing (PCAH)[33].iterative quantization (ITQ)6],anchor graph hashing (AGH)[18]. Suppose we have N points as the training data,each rep- isotropic hashing (IsoHash)[9].multimodel latent binary resented as a feature vector xiRD.Some pairs of points embedding (MLBE)42 and predictable dual-view hash- have similarity labels sij associated,where s=1 means ing (PDH)23.Due to the fact that high level semantic xi and xj are similar and s=0 means xi and x;are dis- description of an object often differs from the extracted low similar.Our goal is to learn a binary code bi(-1,1)0 level feature descriptors,known as semantic gap 25,re- for each xi with similarity between pairs preserved.In par- turning nearest neighbors according to metric distances be- ticular,when sij =1,the binary codes bi and bi should tween the feature vectors doesn't always guarantee a good have a low Hamming distance.On the other hand,when search quality.Hence,many recent works focus on super- sij =0,the Hamming distance between bi and bj i should vised methods which try to preserve the semantic structure be high.In compact form,we use a matrix X RNxD to among the data points by utilizing their associated semantic denote all the feature vectors,a matrix B[-1,1]NxQ to information [17,19].Although there are also some works to denote all the binary codes,and a set S=fsii}to denote exploit other types of supervised information like the rank- all the observed similarity labels.Additionally,we use the ing information for hashing [16,20,the semantic informa- notation Ai and A.;to denote the ith row and the jth tion is usually given in the form of pairwise labels indicating column of a matrix A,respectively.AT is the transpose whether two data points are known to be similar or dissim- of A.The similarity labels S can be constructed from the ilar.Representative supervised methods include restricted neighborhood structure by thresholding on the metric dis- Boltzmann machine based hashing(RBM)[24],binary re tances between the feature vectors [17].However,such s is constructive embedding (BRE)12.sequential projection of low quality since no semantic information is utilized.In learning for hashing(SPLH)[33,minimal loss hashing (ML supervised hashing setting,S is often constructed from the H)[19],kernel-based supervised hashing(KSH)[17],and semantic labels within the data points.Such labels are often linear discriminant analysis based hashing(LDAHash)[28. built with manual effort to ensure its quality. Additionally,there are also some semi-supervised hashing methods [32 which use both labeled data and unlabeled da 2.2 Model Formulation ta to train their model.As stated in recent works [17,19 Let i;denote half of the inner product between two bi- 20,sophisticated supervised methods,such as SPLH,ML nary codes bi,bj E{-1,1)9: H,and KSH,can achieve higher accuracy than unsupervised methods.However,some existing supervised methods,like 6=5bb MLH,suffer from a very large amount of training time,mak- ing it difficult to apply to large-scale datasets The likelihood of the observed similarity labels S can be In this paper,we propose a novel method,called latent defined as follows: factor hashing (LFH),for supervised hashing.The main contributions of this paper are outlined as follows: p(S|B)=Πp(s|B) (1) BijES Based on latent factor models,the likelihood of the with pairwise similarities are elegantly modeled as a func- tion of the Hamming distance between the correspond- p(sij IB)= $=1 ing data points. 1-a,s6=0 where aii=(i)with o being the logistic function An algorithm with convergence guarantee is proposed to learn the parameters of LFH. (r=1+e-techniques have become a promising choice for efficient ANN search on massive datasets. Existing hashing methods can be divided into two cate￾gories: data-independent methods and data-dependent meth￾ods [6, 17, 18]. For data-independent methods, the hash￾ing functions are learned without using any training data. Representative data-independent methods include locality￾sensitive hashing (LSH) [1, 5, 7], shift-invariant kernels hash￾ing (SIKH) [22], and many other extensions [4, 13, 14]. On the other hand, for data-dependent methods, their hashing functions are learned from some training data. Generally s￾peaking, data-dependent methods often require less number of bits than data-independent methods to achieve satisfac￾tory performance. The data-dependent methods can be further divided into two categories: unsupervised methods and supervised meth￾ods [17, 20, 32]. Unsupervised methods try to preserve the metric (Euclidean) structure between data points us￾ing only their feature information. Representative unsuper￾vised methods include spectral hashing (SH) [34], principal component analysis based hashing (PCAH) [33], iterative quantization (ITQ) [6], anchor graph hashing (AGH) [18], isotropic hashing (IsoHash) [9], multimodel latent binary embedding (MLBE) [42] and predictable dual-view hash￾ing (PDH) [23]. Due to the fact that high level semantic description of an object often differs from the extracted low level feature descriptors, known as semantic gap [25], re￾turning nearest neighbors according to metric distances be￾tween the feature vectors doesn’t always guarantee a good search quality. Hence, many recent works focus on super￾vised methods which try to preserve the semantic structure among the data points by utilizing their associated semantic information [17, 19]. Although there are also some works to exploit other types of supervised information like the rank￾ing information for hashing [16, 20], the semantic informa￾tion is usually given in the form of pairwise labels indicating whether two data points are known to be similar or dissim￾ilar. Representative supervised methods include restricted Boltzmann machine based hashing (RBM) [24], binary re￾constructive embedding (BRE) [12], sequential projection learning for hashing (SPLH) [33], minimal loss hashing (ML￾H) [19], kernel-based supervised hashing (KSH) [17], and linear discriminant analysis based hashing (LDAHash) [28]. Additionally, there are also some semi-supervised hashing methods [32] which use both labeled data and unlabeled da￾ta to train their model. As stated in recent works [17, 19, 20], sophisticated supervised methods, such as SPLH, ML￾H, and KSH, can achieve higher accuracy than unsupervised methods. However, some existing supervised methods, like MLH, suffer from a very large amount of training time, mak￾ing it difficult to apply to large-scale datasets. In this paper, we propose a novel method, called latent factor hashing (LFH), for supervised hashing. The main contributions of this paper are outlined as follows: • Based on latent factor models, the likelihood of the pairwise similarities are elegantly modeled as a func￾tion of the Hamming distance between the correspond￾ing data points. • An algorithm with convergence guarantee is proposed to learn the parameters of LFH. • To model the large-scale problems, a linear-time vari￾ant with stochastic learning is proposed for fast pa￾rameter learning. • Experimental results on two real datasets with seman￾tic labels show that LFH can achieve much higher ac￾curacy than other state-of-the-art methods with effi- ciency in training time. The rest of this paper is organized as follows: In Section 2, we will introduce the details of our LFH model. Experimen￾tal results are presented in Section 3. Finally, we conclude the paper in Section 4. 2. LATENT FACTOR HASHING In this section, we present the details of our latent factor hashing (LFH) model, including the model formulation and learning algorithms. 2.1 Problem Definition Suppose we have N points as the training data, each rep￾resented as a feature vector xi ∈ R D. Some pairs of points have similarity labels sij associated, where sij = 1 means xi and xj are similar and sij = 0 means xi and xj are dis￾similar. Our goal is to learn a binary code bi ∈ {−1, 1} Q for each xi with similarity between pairs preserved. In par￾ticular, when sij = 1, the binary codes bi and bj should have a low Hamming distance. On the other hand, when sij = 0, the Hamming distance between bi and bj should be high. In compact form, we use a matrix X ∈ R N×D to denote all the feature vectors, a matrix B ∈ {−1, 1} N×Q to denote all the binary codes, and a set S = {sij} to denote all the observed similarity labels. Additionally, we use the notation Ai∗ and A∗j to denote the ith row and the jth column of a matrix A, respectively. AT is the transpose of A. The similarity labels S can be constructed from the neighborhood structure by thresholding on the metric dis￾tances between the feature vectors [17]. However, such S is of low quality since no semantic information is utilized. In supervised hashing setting, S is often constructed from the semantic labels within the data points. Such labels are often built with manual effort to ensure its quality. 2.2 Model Formulation Let Θij denote half of the inner product between two bi￾nary codes bi, bj ∈ {−1, 1} Q: Θij = 1 2 b T i bj . The likelihood of the observed similarity labels S can be defined as follows: p(S | B) = Y sij∈S p(sij | B), (1) with p(sij | B) = ( aij , sij = 1 1 − aij , sij = 0 , where aij = σ(Θij ) with σ being the logistic function: σ(x) = 1 1 + e−x
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有