正在加载图片...
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence(IJCAI-17) ding for each instance to predict the neighborhood context in point xi into a c-dimensional binary code bi=H(xi)= the graph.Finally,we feed both raw pixels and embeddings [hi(xi),h2(xi),...,he(xi)]TE{-1,1)c.The binary codes to a deep neural network,and concatenate the corresponding B=[bshould preserve the semantic similarity and hidden layers when producing binary codes.BGDH is a gen- structure similarity in the Hamming space. eral learning framework in the sense that any loss function of hashing and any type of graph can be incorporated. 3 Semi-Supervised Deep Hashing with a Graph-based methods are usually transductive,because they can only handle instances that are already appeared in Bipartite Graph the graph.Since embeddings of instances are learnt from In this section,we first present the details of our semi- the graph,the basic BGDH is also transductive.To address supervised Deep Hashing with a Bipartite Graph (BGDH), the out-of-sample problem,we further propose an inductive then introduce an inductive variant and finally discuss the variant of BGDH,in which the embeddings are defined as learning procedure. a parametric function of the raw features.In this way,we can produce hash codes for new instances that have not seen 3.1 The Proposed BGDH Framework during training.To demonstrate the effectiveness of our ap- The end-to-end deep learning architecture of our BGDH is proach,we conduct extensive experiments on two large-scale shown in Figure 1,which includes three main components: datasets:CIFAR-10 and NUS-WIDE.Experimental result- graph embedding,feature learning,and hash code learning. s show that BGDH outperforms other methods and achieves Similar to other semi-supervised learning methods [Yang et the state-of-the-art performance in image retrieval. al.,2016],the loss function of BGDH can be expressed as Finally,we emphasize that although both BGDH and SS- DH are semi-supervised deep hashing methods,the proposed C.+λCg (1) BGDH differs from SSDH in the following two aspects: where Cs is a supervised loss designed to preserve the simi- a. While SSDH is built upon graph regularization,our larity between pairwise instances,and Co is an unsupervised BGDH relies on graph embedding. loss of predicting the graph context.In the following,we b.SSDH uses graphs to exploit the unlabeled data,in con- first introduce Co which aims to learn embeddings from the tract BGDH makes use of bipartite graphs,which can bipartite graph g,then formulate Cs which is used to leam be constructed more efficiently since building an anchor both features and binary codes from hidden layers of deep graph only costs O(n)time. networks. 2 Notations and Problem Definitions Graph embedding We propose to use a bipartite graph g=(,O,E)to cap- In this section,we introduce notations and problem defini- ture the information hidden in the unlabeled data.It can be tions. constructed in different ways as stated below. 2.1 Notations An anchor graph constructed from the dataset t.In this case,O contains m landmarks and the construction of G We use script letters like to denote sets,boldface lowercase takes O(n)time [Liu et al.,20101. letters like e to denote vectors and boldface uppercase letters .A nearest neighbor graph.In this case,=Y and the like E to denote matrices.We denote the element at the i-th construction of G takes O(n2)time. row and j-th column of E by Eii.ET is the transpose of E. A semantic graph constructed from external data.In this l2 denotes the Euclidean norm of a vector and F denotes case,O may contain m concepts,styles,or owners. the Frobenius norm of a matrix.sgn()is an element-wise In the following,we briefly introduce one way to construct an sign function and o()is the sigmoid function.u;v]denotes anchor graph.We first randomly sample m instances from the concatenation of two vectors u and v. as landmarks,denoted by =fo1,...,om}.Then,we put an edge between xi and o;if o;is among k nearest landmarks 2.2 Problem Definitions of x;,or if the distance between them is smaller than some Given a set of n instances/images ={xi where xi is threshold e.Let A E Rxm be the similarity matrix of g, the feature vector of the i-th instance.Without loss of gener- where A:;denotes the weight of the edge between xi and oj. ality,we assume the first l instances [x1,...,x}are labeled The value of Aij may be binary,that is,Ai=1 if there is and the rest are unlabeled.We assume the supervised infor- an edge,otherwise 0.If a real value is preferred,Aij can be mation is given in term of pairwise labels,though our method set according to the heat kernel:Aij=e/,where can also support other kinds of labels.Specifically,for the P>0 is a parameter. first I instances,we have a set of pairwise labels S={sij} The goal of graph embedding is to learn an embedding for with sij E[0,1,where sij =1 means that xi and xj are each instance that predicts the context in the graph.Given an similar,sij=0 means that xi and xj are dissimilar.In ad- instance and its context,the objective of graph embedding is dition,a bipartite graph g=(,O,E)between n instances usually formulated as minimizing certain loss of predicting and m objects are given,where is a set of objects such as the context using the embedding of an instance as input fea- concepts or landmarks,and is the set of edges. ture [Weston et al.,2012;Mikolov et al.,2013].The context The goal of our semi-supervised hashing method is to learn of an instance can be simply defined as its neighbors in the a mapping function H:xi-1,1c,which encodes each graph,or generated by sophisticated methods such as random 3239ding for each instance to predict the neighborhood context in the graph. Finally, we feed both raw pixels and embeddings to a deep neural network, and concatenate the corresponding hidden layers when producing binary codes. BGDH is a gen￾eral learning framework in the sense that any loss function of hashing and any type of graph can be incorporated. Graph-based methods are usually transductive, because they can only handle instances that are already appeared in the graph. Since embeddings of instances are learnt from the graph, the basic BGDH is also transductive. To address the out-of-sample problem, we further propose an inductive variant of BGDH, in which the embeddings are defined as a parametric function of the raw features. In this way, we can produce hash codes for new instances that have not seen during training. To demonstrate the effectiveness of our ap￾proach, we conduct extensive experiments on two large-scale datasets: CIFAR-10 and NUS-WIDE. Experimental result￾s show that BGDH outperforms other methods and achieves the state-of-the-art performance in image retrieval. Finally, we emphasize that although both BGDH and SS￾DH are semi-supervised deep hashing methods, the proposed BGDH differs from SSDH in the following two aspects: a. While SSDH is built upon graph regularization, our BGDH relies on graph embedding. b. SSDH uses graphs to exploit the unlabeled data, in con￾tract BGDH makes use of bipartite graphs, which can be constructed more efficiently since building an anchor graph only costs O(n) time. 2 Notations and Problem Definitions In this section, we introduce notations and problem defini￾tions. 2.1 Notations We use script letters like X to denote sets, boldface lowercase letters like e to denote vectors and boldface uppercase letters like E to denote matrices. We denote the element at the i-th row and j-th column of E by Eij . ET is the transpose of E. k·k2 denotes the Euclidean norm of a vector and k·kF denotes the Frobenius norm of a matrix. sgn(·) is an element-wise sign function and σ(·) is the sigmoid function. [u; v] denotes the concatenation of two vectors u and v. 2.2 Problem Definitions Given a set of n instances/images X = {xi} n i=1 where xi is the feature vector of the i-th instance. Without loss of gener￾ality, we assume the first l instances {x1, . . . , xl} are labeled and the rest are unlabeled. We assume the supervised infor￾mation is given in term of pairwise labels, though our method can also support other kinds of labels. Specifically, for the first l instances, we have a set of pairwise labels S = {sij} with sij ∈ {0, 1}, where sij = 1 means that xi and xj are similar, sij = 0 means that xi and xj are dissimilar. In ad￾dition, a bipartite graph G = (X , O, E) between n instances and m objects are given, where O is a set of objects such as concepts or landmarks, and E is the set of edges. The goal of our semi-supervised hashing method is to learn a mapping function H : xi → {−1, 1} c , which encodes each point xi into a c-dimensional binary code bi = H(xi) = [h1(xi), h2(xi), · · · , hc(xi)]T ∈ {−1, 1} c . The binary codes B = {bi} n i=1 should preserve the semantic similarity and structure similarity in the Hamming space. 3 Semi-Supervised Deep Hashing with a Bipartite Graph In this section, we first present the details of our semi￾supervised Deep Hashing with a Bipartite Graph (BGDH), then introduce an inductive variant and finally discuss the learning procedure. 3.1 The Proposed BGDH Framework The end-to-end deep learning architecture of our BGDH is shown in Figure 1, which includes three main components: graph embedding, feature learning, and hash code learning. Similar to other semi-supervised learning methods [Yang et al., 2016], the loss function of BGDH can be expressed as Ls + λLg (1) where Ls is a supervised loss designed to preserve the simi￾larity between pairwise instances, and Lg is an unsupervised loss of predicting the graph context. In the following, we first introduce Lg which aims to learn embeddings from the bipartite graph G, then formulate Ls which is used to learn both features and binary codes from hidden layers of deep networks. Graph embedding We propose to use a bipartite graph G = (X , O, E) to cap￾ture the information hidden in the unlabeled data. It can be constructed in different ways as stated below. • An anchor graph constructed from the dataset X . In this case, O contains m landmarks and the construction of G takes O(n) time [Liu et al., 2010]. • A nearest neighbor graph. In this case, O = X and the construction of G takes O(n 2 ) time. • A semantic graph constructed from external data. In this case, O may contain m concepts, styles, or owners. In the following, we briefly introduce one way to construct an anchor graph. We first randomly sample m instances from X as landmarks, denoted by O = {o1, . . . , om}. Then, we put an edge between xi and oj if oj is among k nearest landmarks of xi , or if the distance between them is smaller than some threshold . Let A ∈ R n×m be the similarity matrix of G, where Aij denotes the weight of the edge between xi and oj . The value of Aij may be binary, that is, Aij = 1 if there is an edge, otherwise 0. If a real value is preferred, Aij can be set according to the heat kernel: Aij = e −kxi−oj k 2 2/ρ, where ρ > 0 is a parameter. The goal of graph embedding is to learn an embedding for each instance that predicts the context in the graph. Given an instance and its context, the objective of graph embedding is usually formulated as minimizing certain loss of predicting the context using the embedding of an instance as input fea￾ture [Weston et al., 2012; Mikolov et al., 2013]. The context of an instance can be simply defined as its neighbors in the graph, or generated by sophisticated methods such as random Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) 3239
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有