TDAF: Top-Down Attention Framework for Vision Tasks

Bo Pang; Yizhuo Li; Jiefeng Li; Muchen Li; Hanwen Cao; Cewu Lu

doi:10.1609/aaai.v35i3.16339

Authors

Bo Pang Shanghai Jiao Tong University
Yizhuo Li Shanghai Jiao Tong University
Jiefeng Li Shanghai Jiao Tong University
Muchen Li Huazhong University Of Science and Technology
Hanwen Cao Shanghai Jiao Tong University
Cewu Lu Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v35i3.16339

Keywords:

Object Detection & Categorization

Abstract

Human attention mechanisms often work in a top-down manner, yet it is not well explored in vision research. Here, we propose the Top-Down Attention Framework (TDAF) to capture top-down attentions, which can be easily adopted in most existing models. The designed Recursive Dual-Directional Nested Structure in it forms two sets of orthogonal paths, recursive and structural ones, where bottom-up spatial features and top-down attention features are extracted respectively. Such spatial and attention features are nested deeply, therefore, the proposed framework works in a mixed top-down and bottom-up manner. Empirical evidence shows that our TDAF can capture effective stratified attention information and boost performance. ResNet with TDAF achieves 2.0% improvements on ImageNet. For object detection, the performance is improved by 2.7% AP over FCOS. For pose estimation, TDAF improves the baseline by 1.6%. And for action recognition, the 3D-ResNet adopting TDAF achieves improvements of 1.7% accuracy.

TDAF: Top-Down Attention Framework for Vision Tasks

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription