ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification

Aihua Zheng; Zhaojun Liu; Xixi Wan; Chenglong Li; Jin Tang; Yan Yan

doi:10.1609/aaai.v40i16.38337

Authors

Aihua Zheng Anhui University
Zhaojun Liu Anhui University
Xixi Wan Anhui University
Chenglong Li Anhui University
Jin Tang Anhui University
Yan Yan University of Illinois Chicago

DOI:

https://doi.org/10.1609/aaai.v40i16.38337

Abstract

Multi-modal object re-identification (ReID) aims to retrieve specific targets by leveraging complementary cues from different sensing modalities. Despite recent progress, two key challenges remain: (1) the limited ability to jointly address both modality and viewpoint discrepancies, and (2) the difficulty of effectively leveraging reliable target-domain data to improve generalization. To address these challenges, we propose Proxy-driven Test-Time Training (ProxyTTT), a unified framework that enhances both multi-modal identity representation learning and model generalization. During training, we propose a Multi-Proxy Learning (MPL) mechanism to address the representation bias across different views and modalities. MPL disentangles fine-grained modality-specific and modality-common identity proxies as semantic anchors to align identity features across diverse perspectives and sensing modalities. This alignment strategy enables the model to learn robust and discriminative global identity representations under heterogeneous modality conditions. At test time, to reliably exploit target domain data, we propose Proxy-guided Entropy-based Selective Adaptation (PESA) for test-time training. Specifically, PESA leverages the semantic structure encoded by identity proxies to estimate prediction uncertainty via entropy, and selectively adapts the model using only high-confidence samples. This selective adaptation effectively mitigates the domain shift between training and deployment environments, improving the model’s generalization in real-world scenarios. Extensive experiments on four public multi-modal ReID benchmarks (RGBNT201, RGBNT100, MSVR310, and WMVeID863) demonstrate the effectiveness of ProxyTTT.

ProxyTTT: Proxy-driven Test-Time Training for Multi-modal Re-identification

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information