Dual Attention Networks for Few-Shot Fine-Grained Recognition
Keywords:Computer Vision (CV), Machine Learning (ML)
AbstractThe task of few-shot fine-grained recognition is to classify images belonging to subordinate categories merely depending on few examples. Due to the fine-grained nature, it is desirable to capture subtle but discriminative part-level patterns from limited training data, which makes it a challenging problem. In this paper, to generate fine-grained tailored representations for few-shot recognition, we propose a Dual Attention Network (Dual Att-Net) consisting of two dual branches of both hard- and soft-attentions. Specifically, by producing attention guidance from deep activations of input images, our hard-attention is realized by keeping a few useful deep descriptors and forming them as a bag of multi-instance learning. Since these deep descriptors could correspond to objects' parts, the advantage of modeling as a multi-instance bag is able to exploit inherent correlation of these fine-grained parts. On the other side, a soft attended activation representation can be obtained by applying attention guidance upon original activations, which brings comprehensive attention information as the counterpart of hard-attention. After that, both outputs of dual branches are aggregated as a holistic image embedding w.r.t. input images. By performing meta-learning, we can learn a powerful image embedding in such a metric space to generalize to novel classes. Experiments on three popular fine-grained benchmark datasets show that our Dual Att-Net obviously outperforms other existing state-of-the-art methods.
How to Cite
Xu, S.-L., Zhang, F., Wei, X.-S., & Wang, J. (2022). Dual Attention Networks for Few-Shot Fine-Grained Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2911-2919. https://doi.org/10.1609/aaai.v36i3.20196
AAAI Technical Track on Computer Vision III