Dual Attention Networks for Few-Shot Fine-Grained Recognition

Shu-Lin Xu; Faen Zhang; Xiu-Shen Wei; Jianhua Wang

doi:10.1609/aaai.v36i3.20196

Authors

Shu-Lin Xu School of Computer Science and Engineering, Nanjing University of Science and Technology State Key Laboratory of Integrated Services Networks, Xidian University
Faen Zhang AInnovation Technology Group Co., Ltd
Xiu-Shen Wei School of Computer Science and Engineering, Nanjing University of Science and Technology State Key Laboratory of Integrated Services Networks, Xidian University State Key Laboratory for Novel Software Technology, Nanjing University
Jianhua Wang AInnovation Technology Group Co., Ltd

DOI:

https://doi.org/10.1609/aaai.v36i3.20196

Keywords:

Computer Vision (CV), Machine Learning (ML)

Abstract

The task of few-shot fine-grained recognition is to classify images belonging to subordinate categories merely depending on few examples. Due to the fine-grained nature, it is desirable to capture subtle but discriminative part-level patterns from limited training data, which makes it a challenging problem. In this paper, to generate fine-grained tailored representations for few-shot recognition, we propose a Dual Attention Network (Dual Att-Net) consisting of two dual branches of both hard- and soft-attentions. Specifically, by producing attention guidance from deep activations of input images, our hard-attention is realized by keeping a few useful deep descriptors and forming them as a bag of multi-instance learning. Since these deep descriptors could correspond to objects' parts, the advantage of modeling as a multi-instance bag is able to exploit inherent correlation of these fine-grained parts. On the other side, a soft attended activation representation can be obtained by applying attention guidance upon original activations, which brings comprehensive attention information as the counterpart of hard-attention. After that, both outputs of dual branches are aggregated as a holistic image embedding w.r.t. input images. By performing meta-learning, we can learn a powerful image embedding in such a metric space to generalize to novel classes. Experiments on three popular fine-grained benchmark datasets show that our Dual Att-Net obviously outperforms other existing state-of-the-art methods.

Dual Attention Networks for Few-Shot Fine-Grained Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription