Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization

Chuanbin Liu; Hongtao Xie; Zheng-Jun Zha; Lingfeng Ma; Lingyun Yu; Yongdong Zhang

doi:10.1609/aaai.v34i07.6822

Authors

Chuanbin Liu University of Science and Technology of China
Hongtao Xie University of Science and Technology of China
Zheng-Jun Zha University of Science and Technology of China
Lingfeng Ma University of Science and Technology of China
Lingyun Yu University of Science and Technology of China
Yongdong Zhang University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v34i07.6822

Abstract

Delicate attention of the discriminative regions plays a critical role in Fine-Grained Visual Categorization (FGVC). Unfortunately, most of the existing attention models perform poorly in FGVC, due to the pivotal limitations in discriminative regions proposing and region-based feature learning. 1) The discriminative regions are predominantly located based on the filter responses over the images, which can not be directly optimized with a performance metric. 2) Existing methods train the region-based feature extractor as a one-hot classification task individually, while neglecting the knowledge from the entire object. To address the above issues, in this paper, we propose a novel “Filtration and Distillation Learning” (FDL) model to enhance the region attention of discriminate parts for FGVC. Firstly, a Filtration Learning (FL) method is put forward for discriminative part regions proposing based on the matchability between proposing and predicting. Specifically, we utilize the proposing-predicting matchability as the performance metric of Region Proposal Network (RPN), thus enable a direct optimization of RPN to filtrate most discriminative regions. Go in detail, the object-based feature learning and region-based feature learning are formulated as “teacher” and “student”, which can furnish better supervision for region-based feature learning. Accordingly, our FDL can enhance the region attention effectively, and the overall framework can be trained end-to-end without neither object nor parts annotations. Extensive experiments verify that FDL yields state-of-the-art performance under the same backbone with the most competitive approaches on several FGVC tasks.

Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription