Cross-Modality Person Re-identification with Memory-Based Contrastive Embedding


  • De Cheng Xidian University
  • Xiaolong Wang Xidian university
  • Nannan Wang Xidian University
  • Zhen Wang Zhejiang Lab
  • Xiaoyu Wang University of Science and Technology of China
  • Xinbo Gao Chongqing University of Posts and Telecommunications



CV: Representation Learning for Vision, CV: Biometrics, Face, Gesture & Pose, CV: Multi-modal Vision, ML: Multimodal Learning


Visible-infrared person re-identification (VI-ReID) aims to retrieve the person images of the same identity from the RGB to infrared image space, which is very important for real-world surveillance system. In practice, VI-ReID is more challenging due to the heterogeneous modality discrepancy, which further aggravates the challenges of traditional single-modality person ReID problem, i.e., inter-class confusion and intra-class variations. In this paper, we propose an aggregated memory-based cross-modality deep metric learning framework, which benefits from the increasing number of learned modality-aware and modality-agnostic centroid proxies for cluster contrast and mutual information learning. Furthermore, to suppress the modality discrepancy, the proposed cross-modality alignment objective simultaneously utilizes both historical and up-to-date learned cluster proxies for enhanced cross-modality association. Such training mechanism helps to obtain hard positive references through increased diversity of learned cluster proxies, and finally achieves stronger ``pulling close'' effect between cross-modality image features. Extensive experiment results demonstrate the effectiveness of the proposed method, surpassing state-of-the-art works significantly by a large margin on the commonly used VI-ReID datasets.




How to Cite

Cheng, D., Wang, X., Wang, N., Wang, Z., Wang, X., & Gao, X. (2023). Cross-Modality Person Re-identification with Memory-Based Contrastive Embedding. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 425-432.



AAAI Technical Track on Computer Vision I