ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification

Authors

  • Aihua Zheng Anhui University
  • Zhaojun Liu Anhui University
  • Xixi Wan Anhui University
  • Chenglong Li Anhui University
  • Jin Tang Anhui University
  • Yan Yan University of Illinois Chicago

DOI:

https://doi.org/10.1609/aaai.v40i16.38337

Abstract

Multi-modal object re-identification (ReID) aims to retrieve specific targets by leveraging complementary cues from different sensing modalities. Despite recent progress, two key challenges remain: (1) the limited ability to jointly address both modality and viewpoint discrepancies, and (2) the difficulty of effectively leveraging reliable target-domain data to improve generalization. To address these challenges, we propose Proxy-driven Test-Time Training (ProxyTTT), a unified framework that enhances both multi-modal identity representation learning and model generalization. During training, we propose a Multi-Proxy Learning (MPL) mechanism to address the representation bias across different views and modalities. MPL disentangles fine-grained modality-specific and modality-common identity proxies as semantic anchors to align identity features across diverse perspectives and sensing modalities. This alignment strategy enables the model to learn robust and discriminative global identity representations under heterogeneous modality conditions. At test time, to reliably exploit target domain data, we propose Proxy-guided Entropy-based Selective Adaptation (PESA) for test-time training. Specifically, PESA leverages the semantic structure encoded by identity proxies to estimate prediction uncertainty via entropy, and selectively adapts the model using only high-confidence samples. This selective adaptation effectively mitigates the domain shift between training and deployment environments, improving the model’s generalization in real-world scenarios. Extensive experiments on four public multi-modal ReID benchmarks (RGBNT201, RGBNT100, MSVR310, and WMVeID863) demonstrate the effectiveness of ProxyTTT.

Downloads

Published

2026-03-14

How to Cite

Zheng, A., Liu, Z., Wan, X., Li, C., Tang, J., & Yan, Y. (2026). ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13342–13350. https://doi.org/10.1609/aaai.v40i16.38337

Issue

Section

AAAI Technical Track on Computer Vision XIII