Semantic Feature Purification for Adversarially-Aware RGB-T Tracking
DOI:
https://doi.org/10.1609/aaai.v40i12.37949Abstract
RGB-T tracking is increasingly deployed in safety-critical applications such as autonomous driving, surveillance, and rescue robotics, where tracking reliability is essential under adverse conditions. Although the fusion of RGB and thermal infrared (TIR) modalities offers improved robustness in low-light and occluded scenes, recent findings show that RGB-T trackers remain highly susceptible to subtle input perturbations, human-imperceptible modifications that exploit cross-modal inconsistencies to mislead tracking outputs. In real-world scenarios, such perturbations can arise from sensor spoofing, infrared camouflage, or physical-world attacks, posing serious risks to operational safety. To address this, we propose SFPT, a Semantic Feature Purification framework that enhances RGB-T tracking at the representation level. Rather than filtering corrupted inputs at the pixel level, SFPT introduces task-specific semantic anchors into the feature space to reinforce perturbation-invariant cues. These anchors are derived from descriptive language, interact with visual features to purify representations. To further suppress modality-specific interference, we design an Adaptive Perturbation-Guided Cross-Modal Fusion (APG-CMF) module, which leverages language and visual signals to estimate reliability and dynamically reweight cross-modal features, ensuring robust fusion under perturbation conditions. Extensive experiments under diverse perturbation conditions validate the effectiveness of our approach. Notably, SFPT maintains performance comparable to clean settings even when subjected to perturbations of strength 1/255 and 4/255, demonstrating strong resilience to real-world interference.Downloads
Published
2026-03-14
How to Cite
Wang, J., Liu, F., Wang, H., Li, S., Wang, X., & Chen, P. (2026). Semantic Feature Purification for Adversarially-Aware RGB-T Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 9847–9855. https://doi.org/10.1609/aaai.v40i12.37949
Issue
Section
AAAI Technical Track on Computer Vision IX