正在加载图片...
ExchNet:A Unified Hashing Network for Large-Scale Fine-Grained Retrieval 191 Representation Learning Local Features Alignment Hash Codes Learning Training Puc0的 Fig.2.Framework of our proposed ExchNet,which consists of three modules.(1) The representation learning module,as well as the attention mechanism with spatial and channel diversity learning constraints,is designed to obtain both local and global features of fine-grained objects.(2)The local feature alignment module is used to align obtained local features w.r.t.object parts across different fine-grained images.(3)The hash codes learning module is performed to generate the compact binary codes. crucial for fine-grained tasks,since these discriminative but subtle parts (e.g., bird heads or tails)play a major role to distinguish different sub-categories. Moreover,we also develop two kinds of attention constraints,i.e.,spatial and channel constraints,to collaboratively work together for further improving the discriminative ability of these local features.In the following,to ensure that these part-level features can correspond to their own corresponding parts across differ- ent fine-grained images,we design an anchor based feature alignment approach to align these local features.Specifically,in the local feature alignment module, we treat the anchored local features as the "prototype"w.r.t.its sub-category by averaging all the local features of that part across images.Once local features are well aligned for their own parts,even if we exchange one specific part's local feature of an input image with the same part's local feature of the prototype, the image meanings derived from the image representations and also the final hash codes should be both extremely similar.Inspired by this motivation,we perform a feature exchanging operation upon the anchored local features and other learned local features,which is illustrated in Fig.3.After that,for effec- tively training the network with our feature alignment fashion,we utilize an alternating algorithm to solve the hashing learning problem and update anchor features simultaneously. To quantitatively prove both effectiveness and efficiency of our ExchNet,we conduct comprehensive experiments on five fine-grained benchmark datasets, including the large-scale ones,i.e.,NABirds [11,VegFru [12 and Food101 [23. Particularly,compared with competing approximate nearest neighbor methods, our ExchNet achieves up to hundreds times speedup for large-scale fine-grained image retrieval without significant accuracy drops.Meanwhile,compared with state-of-the-art generic hashing methods,ExchNet could consistently outperform these methods by a large margin on all the fine-grained datasets.Additionally, ablation studies and visualization results justify the effectiveness of our tailored model designs like local feature alignment and proposed attention approach.ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Retrieval 191 Backbone CNN Hashing Network Attention Generation ! Global Features Refinement Local Features Refinement M Attention Maps Spatial Diversity Channel Diversity GAP GAP Representation Learning Local Features Alignment Hash Codes Learning Anchor Local Features Local Feature Extractor Global Feature Extractor (Training Phase Only) " #! ! $" # "%! Fig. 2. Framework of our proposed ExchNet, which consists of three modules. (1) The representation learning module, as well as the attention mechanism with spatial and channel diversity learning constraints, is designed to obtain both local and global features of fine-grained objects. (2) The local feature alignment module is used to align obtained local features w.r.t. object parts across different fine-grained images. (3) The hash codes learning module is performed to generate the compact binary codes. crucial for fine-grained tasks, since these discriminative but subtle parts (e.g., bird heads or tails) play a major role to distinguish different sub-categories. Moreover, we also develop two kinds of attention constraints, i.e., spatial and channel constraints, to collaboratively work together for further improving the discriminative ability of these local features. In the following, to ensure that these part-level features can correspond to their own corresponding parts across differ￾ent fine-grained images, we design an anchor based feature alignment approach to align these local features. Specifically, in the local feature alignment module, we treat the anchored local features as the “prototype” w.r.t. its sub-category by averaging all the local features of that part across images. Once local features are well aligned for their own parts, even if we exchange one specific part’s local feature of an input image with the same part’s local feature of the prototype, the image meanings derived from the image representations and also the final hash codes should be both extremely similar. Inspired by this motivation, we perform a feature exchanging operation upon the anchored local features and other learned local features, which is illustrated in Fig. 3. After that, for effec￾tively training the network with our feature alignment fashion, we utilize an alternating algorithm to solve the hashing learning problem and update anchor features simultaneously. To quantitatively prove both effectiveness and efficiency of our ExchNet, we conduct comprehensive experiments on five fine-grained benchmark datasets, including the large-scale ones, i.e., NABirds [11], VegFru [12] and Food101 [23]. Particularly, compared with competing approximate nearest neighbor methods, our ExchNet achieves up to hundreds times speedup for large-scale fine-grained image retrieval without significant accuracy drops. Meanwhile, compared with state-of-the-art generic hashing methods, ExchNet could consistently outperform these methods by a large margin on all the fine-grained datasets. Additionally, ablation studies and visualization results justify the effectiveness of our tailored model designs like local feature alignment and proposed attention approach
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有