Dynamic Position-aware Network for Fine-grained Image Recognition
Keywords:Object Detection & Categorization
AbstractMost weakly supervised fine-grained image recognition (WFGIR) approaches predominantly focus on learning the discriminative details which contain the visual variances and position clues. The position clues can be indirectly learnt by utilizing context information of discriminative visual content. However, this will cause the selected discriminative regions containing some non-discriminative information introduced by the position clues. These analysis motivate us to directly introduce position clues into visual content to only focus on the visual variances, achieving more precise discriminative region localization. Though important, position modelling usually requires significant pixel/region annotations and therefore is labor-intensive. To address this issue, we propose an end-to-end Dynamic Position-aware Network (DP-Net) to directly incorporate the position clues into visual content and dynamically align them without extra annotations, which eliminates the effect of position information for visual variances of subcategories. In particular, the DP-Net consists of: 1) Position Encoding Module, which learns a set of position-aware parts by directly adding the learnable position information into the horizontal/vertical visual content of images; 2) Position-vision Aligning Module, which dynamically aligns both visual content and learnable position information via performing graph convolution on position-aware parts; 3) Position-vision Reorganization Module, which projects the aligned position clues and visual content into the Euclidean space to construct a position-aware feature maps. Finally, the position-aware feature maps are used which is implicitly applied the aligned visual content and position clues for more accurate discriminative regions localization. Extensive experiments verify that DP-Net yields the best performance under the same settings with most competitive approaches, on CUB Bird, Stanford-Cars, and FGVC Aircraft datasets.
How to Cite
Wang, S., Li, H., Wang, Z., & Ouyang, W. (2021). Dynamic Position-aware Network for Fine-grained Image Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 2791-2799. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16384
AAAI Technical Track on Computer Vision III