Global Dilated Attention and Target Focusing Network for Robust Tracking

Authors

  • Yun Liang South China Agricultural University
  • Qiaoqiao Li South China Agricultural University
  • Fumian Long South China Agricultural University

DOI:

https://doi.org/10.1609/aaai.v37i2.25241

Keywords:

CV: Motion & Tracking

Abstract

Self Attention has shown the excellent performance in tracking due to its global modeling capability. However, it brings two challenges: First, its global receptive field has less attention on local structure and inter-channel associations, which limits the semantics to distinguish objects and backgrounds; Second, its feature fusion with linear process cannot avoid the interference of non-target semantic objects. To solve the above issues, this paper proposes a robust tracking method named GdaTFT by defining the Global Dilated Attention (GDA) and Target Focusing Network (TFN). The GDA provides a new global semantics modeling approach to enhance the semantic objects while eliminating the background. It is defined via the local focusing module, dilated attention and channel adaption module. Thus, it promotes semantics by focusing local key information, building long-range dependencies and enhancing the semantics of channels. Subsequently, to distinguish the target and non-target objects both with rich semantics, the TFN is proposed to accurately focus the target region. Different from the present feature fusion, it uses the template as the query to build a point-to-point correlation between the template and search region, and finally achieves part-level augmentation of target feature in the search region. Thus, the TFN efficiently augments the target embedding while weakening the non-target objects. Experiments on challenging benchmarks (LaSOT, TrackingNet, GOT-10k, OTB-100) demonstrate that the GdaTFT outperforms many state-of-the-art trackers and achieves leading performance. Code will be available.

Downloads

Published

2023-06-26

How to Cite

Liang, Y., Li, Q., & Long, F. (2023). Global Dilated Attention and Target Focusing Network for Robust Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 1549-1557. https://doi.org/10.1609/aaai.v37i2.25241

Issue

Section

AAAI Technical Track on Computer Vision II