正在加载图片...
190 Q.Cui et al. ner-class vananoe ■:1口:0 Artic Tem ··① Bna时 Code oon Tem ■ ommon Tem Dissimilar Binary Codes Green Jay 口 Gircen Jay Fig.1.Illustration of the fine-grained hashing task.Fine-grained images could share large intra-class variances but small inter-class variances.Fine-grained hashing aims to generate compact binary codes with tiny Hamming distances for images of the same sub-category,as well as distinct codes for images from different sub-categories. 1 Introduction Fine-Grained Image Retrieval(FGIR)[19,26,31,36,41,42]is a practical but chal- lenging computer vision task.It aims to retrieve images belonging to various sub-categories of a certain meta-category (e.g.,birds,cars and aircrafts)and return images with the same sub-category as the query image.In real FGIR applications,previous methods could suffer from slow query speed and redun- dant storage costs due to both the explosive growth of massive fine-grained data and high-dimensional real-valued features. Learning to hash [3,6,7,10,14,16,17,21,22,34,35]has proven to be a promis- ing solution for large-scale image retrieval because it can greatly reduce the storage cost and increase the query speed.As a representative research area of approximate nearest neighbor (ANN)search [1,6,13],hashing aims to embed data points as similarity-preserving binary codes.Recently,hashing has been successfully applied in a wide range of image retrieval tasks,e.g.,face image retrieval [18],person re-identification [5,43],etc.We hereby explore the effec- tiveness of hashing for fine-grained image retrieval. To the best of our knowledge,this is the first work to study the fine-grained hashing problem,which refers to the problem of designing hashing for fine- grained objects.As shown in Fig.1,the task is desirable to generate compact binary codes for fine-grained images sharing both large intra-class variances and small inter-class variances.To deal with the challenging task,we propose a uni- fied end-to-end trainable network ExchNet to first learn fine-grained tailored features and then generate the final binary hash codes. In concretely,our ExchNet consists of three main modules,including rep- resentation learning,local feature alignment and hash code learning,as shown in Fig.2.In the representation learning module,beyond obtaining the holistic image representation (i.e.,global features),we also employ the attention mech- anism to capture the part-level features (i.e.,local features)for representing fine-grained objects'parts.Localizing parts and embedding part-level cues are190 Q. Cui et al. Artic Tern Common Tern Green Jay Intra-class variance Inter-class variance Artic Tern Common Tern Green Jay Feature Extractor Similar Binary Codes : 1 : 0 Dissimilar Binary Codes Hashing Network Fig. 1. Illustration of the fine-grained hashing task. Fine-grained images could share large intra-class variances but small inter-class variances. Fine-grained hashing aims to generate compact binary codes with tiny Hamming distances for images of the same sub-category, as well as distinct codes for images from different sub-categories. 1 Introduction Fine-Grained Image Retrieval (FGIR) [19,26,31,36,41,42] is a practical but chal￾lenging computer vision task. It aims to retrieve images belonging to various sub-categories of a certain meta-category (e.g., birds, cars and aircrafts) and return images with the same sub-category as the query image. In real FGIR applications, previous methods could suffer from slow query speed and redun￾dant storage costs due to both the explosive growth of massive fine-grained data and high-dimensional real-valued features. Learning to hash [3,6,7,10,14,16,17,21,22,34,35] has proven to be a promis￾ing solution for large-scale image retrieval because it can greatly reduce the storage cost and increase the query speed. As a representative research area of approximate nearest neighbor (ANN) search [1,6,13], hashing aims to embed data points as similarity-preserving binary codes. Recently, hashing has been successfully applied in a wide range of image retrieval tasks, e.g., face image retrieval [18], person re-identification [5,43], etc. We hereby explore the effec￾tiveness of hashing for fine-grained image retrieval. To the best of our knowledge, this is the first work to study the fine-grained hashing problem, which refers to the problem of designing hashing for fine￾grained objects. As shown in Fig. 1, the task is desirable to generate compact binary codes for fine-grained images sharing both large intra-class variances and small inter-class variances. To deal with the challenging task, we propose a uni- fied end-to-end trainable network ExchNet to first learn fine-grained tailored features and then generate the final binary hash codes. In concretely, our ExchNet consists of three main modules, including rep￾resentation learning, local feature alignment and hash code learning, as shown in Fig. 2. In the representation learning module, beyond obtaining the holistic image representation (i.e., global features), we also employ the attention mech￾anism to capture the part-level features (i.e., local features) for representing fine-grained objects’ parts. Localizing parts and embedding part-level cues are
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有