TdAttenMix: Top-Down Attention Guided Mixup

Authors

  • Zhiming Wang State Key Laboratory of VR Technology and Systems, School of CSE, Beihang University
  • Lin Gu RIKEN AIP The University of Tokyo, Japan
  • Feng Lu State Key Laboratory of VR Technology and Systems, School of CSE, Beihang University

DOI:

https://doi.org/10.1609/aaai.v39i8.32888

Abstract

CutMix is a data augmentation strategy that cuts and pastes image patches to mixup training data. Existing methods pick either random or salient areas which are often inconsistent to labels, thus misguiding the training model. By our knowledge, we integrate human gaze to guide cutmix for the first time. Since human attention is driven by both high-level recognition and low-level clues, we propose a controllable Top-down Attention Guided Module to obtain a general artificial attention which balances top-down and bottom-up attention. The proposed TdATttenMix then picks the patches and adjust the label mixing ratio that focuses on regions relevant to the current label. Experimental results demonstrate that our TdAttenMix outperforms existing state-of-the-art mixup methods across eight different benchmarks. Additionally, we introduce a new metric based on the human gaze and use this metric to investigate the issue of image-label inconsistency.

Downloads

Published

2025-04-11

How to Cite

Wang, Z., Gu, L., & Lu, F. (2025). TdAttenMix: Top-Down Attention Guided Mixup. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 8232–8240. https://doi.org/10.1609/aaai.v39i8.32888

Issue

Section

AAAI Technical Track on Computer Vision VII