Semantic Feature Purification for Adversarially-Aware RGB-T Tracking

Authors

  • Jiahao Wang Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education International Research Center for Intelligent Perception and Computation Joint International Research Laboratory of Intelligent Perception and Computation School of Artificial Intelligent, Xidian University, Xi’an,710071, P.R. China
  • Fang Liu Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education International Research Center for Intelligent Perception and Computation Joint International Research Laboratory of Intelligent Perception and Computation School of Artificial Intelligent, Xidian University, Xi’an,710071, P.R. China
  • Hao Wang Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education International Research Center for Intelligent Perception and Computation Joint International Research Laboratory of Intelligent Perception and Computation School of Artificial Intelligent, Xidian University, Xi’an,710071, P.R. China
  • Shuo Li Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education International Research Center for Intelligent Perception and Computation Joint International Research Laboratory of Intelligent Perception and Computation School of Artificial Intelligent, Xidian University, Xi’an,710071, P.R. China
  • Xinyi Wang Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education International Research Center for Intelligent Perception and Computation Joint International Research Laboratory of Intelligent Perception and Computation School of Artificial Intelligent, Xidian University, Xi’an,710071, P.R. China
  • Puhua Chen Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education International Research Center for Intelligent Perception and Computation Joint International Research Laboratory of Intelligent Perception and Computation School of Artificial Intelligent, Xidian University, Xi’an,710071, P.R. China

DOI:

https://doi.org/10.1609/aaai.v40i12.37949

Abstract

RGB-T tracking is increasingly deployed in safety-critical applications such as autonomous driving, surveillance, and rescue robotics, where tracking reliability is essential under adverse conditions. Although the fusion of RGB and thermal infrared (TIR) modalities offers improved robustness in low-light and occluded scenes, recent findings show that RGB-T trackers remain highly susceptible to subtle input perturbations, human-imperceptible modifications that exploit cross-modal inconsistencies to mislead tracking outputs. In real-world scenarios, such perturbations can arise from sensor spoofing, infrared camouflage, or physical-world attacks, posing serious risks to operational safety. To address this, we propose SFPT, a Semantic Feature Purification framework that enhances RGB-T tracking at the representation level. Rather than filtering corrupted inputs at the pixel level, SFPT introduces task-specific semantic anchors into the feature space to reinforce perturbation-invariant cues. These anchors are derived from descriptive language, interact with visual features to purify representations. To further suppress modality-specific interference, we design an Adaptive Perturbation-Guided Cross-Modal Fusion (APG-CMF) module, which leverages language and visual signals to estimate reliability and dynamically reweight cross-modal features, ensuring robust fusion under perturbation conditions. Extensive experiments under diverse perturbation conditions validate the effectiveness of our approach. Notably, SFPT maintains performance comparable to clean settings even when subjected to perturbations of strength 1/255 and 4/255, demonstrating strong resilience to real-world interference.

Downloads

Published

2026-03-14

How to Cite

Wang, J., Liu, F., Wang, H., Li, S., Wang, X., & Chen, P. (2026). Semantic Feature Purification for Adversarially-Aware RGB-T Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 9847–9855. https://doi.org/10.1609/aaai.v40i12.37949

Issue

Section

AAAI Technical Track on Computer Vision IX