Heterogeneous Test-Time Training for Multi-Modal Person Re-identification

Authors

  • Zi Wang School of Computer Science and Technology, Anhui University, Hefei, China
  • Huaibo Huang MAIS & CRIPAC, CASIA, Beijing, China
  • Aihua Zheng Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Artificial Intelligence, Anhui University, Hefei, China
  • Ran He MAIS & CRIPAC, CASIA, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i6.28398

Keywords:

CV: Multi-modal Vision, CV: Image and Video Retrieval

Abstract

Multi-modal person re-identification (ReID) seeks to mitigate challenging lighting conditions by incorporating diverse modalities. Most existing multi-modal ReID methods concentrate on leveraging complementary multi-modal information via fusion or interaction. However, the relationships among heterogeneous modalities and the domain traits of unlabeled test data are rarely explored. In this paper, we propose a Heterogeneous Test-time Training (HTT) framework for multi-modal person ReID. We first propose a Cross-identity Inter-modal Margin (CIM) loss to amplify the differentiation among distinct identity samples. Moreover, we design a Multi-modal Test-time Training (MTT) strategy to enhance the generalization of the model by leveraging the relationships in the heterogeneous modalities and the information existing in the test data. Specifically, in the training stage, we utilize the CIM loss to further enlarge the distance between anchor and negative by forcing the inter-modal distance to maintain the margin, resulting in an enhancement of the discriminative capacity of the ultimate descriptor. Subsequently, since the test data contains characteristics of the target domain, we adapt the MTT strategy to optimize the network before the inference by using self-supervised tasks designed based on relationships among modalities. Experimental results on benchmark multi-modal ReID datasets RGBNT201, Market1501-MM, RGBN300, and RGBNT100 validate the effectiveness of the proposed method. The codes can be found at https://github.com/ziwang1121/HTT.

Published

2024-03-24

How to Cite

Wang, Z., Huang, H., Zheng, A., & He, R. (2024). Heterogeneous Test-Time Training for Multi-Modal Person Re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5850-5858. https://doi.org/10.1609/aaai.v38i6.28398

Issue

Section

AAAI Technical Track on Computer Vision V