Dual-Teacher Interactive Knowledge Distillation Network for Text-to-Visible & Infrared Person Retrieval
DOI:
https://doi.org/10.1609/aaai.v40i8.37527Abstract
Text-to-visible & infrared person retrieval aims to retrieve the corresponding visible (RGB) and thermal infrared (TIR) images given the text descriptions. Existing methods perform semantic decoupling by aligning RGB and TIR features separately to different attributes, thereby facilitating the alignment between the fused multimodal representation and the text. However, insufficient TIR representation ability and cross-view representation capabilities of RGB and TIR modalities limit the retrieval accuracy and robustness. To address these issues, we propose a novel Dual-teacher Interactive Knowledge Distillation Network called DIKDNet, which performs the interactive knowledge distillation between two modality-specific teachers with rich cross-view representation capabilities to enhance TIR representations and the collaborative knowledge distillation from both teachers to the corresponding students to enhance the cross-modal cross-view representations, for robust text-to-visible & infrared person retrieval. Specifically, to enhance the representation ability of the TIR backbone network while preserving modality-specific characteristics, we design an Interactive Knowledge Distillation Module (IKDM), which introduces a boundary-constrained distillation strategy between RGB and TIR backbones, to transfer the semantic features of RGB backbone to TIR one. To enhance the cross-modal cross-view representation capability, we design a Collaborative Knowledge Distillation Module (CKDM) to transfer the cross-modal similarity relations and the cross-view multimodal representations from teacher networks to student ones. Experimental results demonstrate that our method consistently achieves significant performance gains on both the RGBT-PEDES and RGBNT201-PEDES datasets. The code will be released upon the acceptance.Published
2026-03-14
How to Cite
Li, C., Chen, Z., Deng, Y., & Zheng, A. (2026). Dual-Teacher Interactive Knowledge Distillation Network for Text-to-Visible & Infrared Person Retrieval. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6037–6045. https://doi.org/10.1609/aaai.v40i8.37527
Issue
Section
AAAI Technical Track on Computer Vision V