Sequential Fusion Based Multi-Granularity Consistency for Space-Time Transformer Tracking
DOI:
https://doi.org/10.1609/aaai.v38i11.29145Keywords:
ML: Deep Learning Algorithms, CV: Other Foundations of Computer Vision, CV: Representation Learning for Vision, CV: Learning & Optimization for CV, CV: ApplicationsAbstract
Regarded as a template-matching task for a long time, visual object tracking has witnessed significant progress in space-wise exploration. However, since tracking is performed on videos with substantial time-wise information, it is important to simultaneously mine the temporal contexts which have not yet been deeply explored. Previous supervised works mostly consider template reform as the breakthrough point, but they are often limited by additional computational burdens or the quality of chosen templates. To address this issue, we propose a Space-Time Consistent Transformer Tracker (STCFormer), which uses a sequential fusion framework with multi-granularity consistency constraints to learn spatiotemporal context information. We design a sequential fusion framework that recombines template and search images based on tracking results from chronological frames, fusing updated tracking states in training. To further overcome the over-reliance on the fixed template without increasing computational complexity, we design three space-time consistent constraints: Label Consistency Loss (LCL) for label-level consistency, Attention Consistency Loss (ACL) for patch-level ROI consistency, and Semantic Consistency Loss (SCL) for feature-level semantic consistency. Specifically, in ACL and SCL, the label information is used to constrain the attention and feature consistency of the target and the background, respectively, to avoid mutual interference. Extensive experiments have shown that our STCFormer outperforms many of the best-performing trackers on several popular benchmarks.Downloads
Published
2024-03-24
How to Cite
Hu, K., Yang, W., Huang, W., Zhou, X., Cao, M., Ren, J., & Tan, H. (2024). Sequential Fusion Based Multi-Granularity Consistency for Space-Time Transformer Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12519-12527. https://doi.org/10.1609/aaai.v38i11.29145
Issue
Section
AAAI Technical Track on Machine Learning II