Heterogeneous Test-Time Training for Multi-Modal Person Re-identification

Zi Wang; Huaibo Huang; Aihua Zheng; Ran He

doi:10.1609/aaai.v38i6.28398

Authors

Zi Wang School of Computer Science and Technology, Anhui University, Hefei, China
Huaibo Huang MAIS & CRIPAC, CASIA, Beijing, China
Aihua Zheng Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, School of Artificial Intelligence, Anhui University, Hefei, China
Ran He MAIS & CRIPAC, CASIA, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i6.28398

Keywords:

CV: Multi-modal Vision, CV: Image and Video Retrieval

Abstract

Multi-modal person re-identification (ReID) seeks to mitigate challenging lighting conditions by incorporating diverse modalities. Most existing multi-modal ReID methods concentrate on leveraging complementary multi-modal information via fusion or interaction. However, the relationships among heterogeneous modalities and the domain traits of unlabeled test data are rarely explored. In this paper, we propose a Heterogeneous Test-time Training (HTT) framework for multi-modal person ReID. We first propose a Cross-identity Inter-modal Margin (CIM) loss to amplify the differentiation among distinct identity samples. Moreover, we design a Multi-modal Test-time Training (MTT) strategy to enhance the generalization of the model by leveraging the relationships in the heterogeneous modalities and the information existing in the test data. Specifically, in the training stage, we utilize the CIM loss to further enlarge the distance between anchor and negative by forcing the inter-modal distance to maintain the margin, resulting in an enhancement of the discriminative capacity of the ultimate descriptor. Subsequently, since the test data contains characteristics of the target domain, we adapt the MTT strategy to optimize the network before the inference by using self-supervised tasks designed based on relationships among modalities. Experimental results on benchmark multi-modal ReID datasets RGBNT201, Market1501-MM, RGBN300, and RGBNT100 validate the effectiveness of the proposed method. The codes can be found at https://github.com/ziwang1121/HTT.

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription