MambaLCT: Boosting Tracking via Long-term Context State Space Model

Authors

  • Xiaohai Li Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University Guangxi Colleges and Universities Key Laboratory of Intelligent Software, Wuzhou University
  • Bineng Zhong Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University
  • Qihua Liang Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University
  • Guorong Li Key Laboratory of Big Data Mining and Knowledge Management, University of Chinese Academy of Sciences
  • Zhiyi Mo Guangxi Colleges and Universities Key Laboratory of Intelligent Software, Wuzhou University
  • Shuxiang Song Key Laboratory of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University

DOI:

https://doi.org/10.1609/aaai.v39i5.32528

Abstract

Effectively constructing context information with long-term dependencies from video sequences is crucial for object tracking. However, the context length constructed by existing work is limited, only considering object information from adjacent frames or video clips, leading to insufficient utilization of contextual information. To address this issue, we propose MambaLCT, which constructs and utilizes target variation cues from the first frame to the current frame for robust tracking. First, a novel unidirectional Context Mamba module is designed to scan frame features along the temporal dimension, gathering target change cues throughout the entire sequence. Specifically, target-related information in frame features is compressed into a hidden state space through a selective scanning mechanism. The target information across the entire video is continuously aggregated into target variation cues. Next, we inject the target change cues into the attention mechanism, providing temporal information for modeling the relationship between the template and search frames. The advantage of MambaLCT is its ability to continuously extend the length of the context, capturing complete target change cues, which enhances the stability and robustness of the tracker. Extensive experiments show that long-term context information enhances the model's ability to perceive targets in complex scenarios. MambaLCT achieves new SOTA performance on six benchmarks while maintaining real-time runing speeds.

Downloads

Published

2025-04-11

How to Cite

Li, X., Zhong, B., Liang, Q., Li, G., Mo, Z., & Song, S. (2025). MambaLCT: Boosting Tracking via Long-term Context State Space Model. Proceedings of the AAAI Conference on Artificial Intelligence, 39(5), 4986–4994. https://doi.org/10.1609/aaai.v39i5.32528

Issue

Section

AAAI Technical Track on Computer Vision IV