Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification

Authors

  • Zi Wang School of Computer Science and Technology, Anhui University
  • Chenglong Li Information Materials and Intelligent Sensing Laboratory of Anhui Province Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Hefei School of Artificial Intelligence, Anhui University
  • Aihua Zheng Information Materials and Intelligent Sensing Laboratory of Anhui Province Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Artificial Intelligence, Anhui University
  • Ran He NLPR, CRIPAC, Institute of Automation, Chinese Academy of Sciences
  • Jin Tang Information Materials and Intelligent Sensing Laboratory of Anhui Province Anhui Provincial Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology, Anhui University

DOI:

https://doi.org/10.1609/aaai.v36i3.20165

Keywords:

Computer Vision (CV)

Abstract

Multi-modal person Re-ID introduces more complementary information to assist the traditional Re-ID task. Existing multi-modal methods ignore the importance of modality-specific information in the feature fusion stage. To this end, we propose a novel method to boost modality-specific representations for multi-modal person Re-ID: Interact, Embed, and EnlargE (IEEE). First, we propose a cross-modal interacting module to exchange useful information between different modalities in the feature extraction phase. Second, we propose a relation-based embedding module to enhance the richness of feature descriptors by embedding the global feature into the fine-grained local information. Finally, we propose multi-modal margin loss to force the network to learn modality-specific information for each modality by enlarging the intra-class discrepancy. Superior performance on multi-modal Re-ID dataset RGBNT201 and three constructed Re-ID datasets validate the effectiveness of the proposed method compared with the state-of-the-art approaches.

Downloads

Published

2022-06-28

How to Cite

Wang, Z., Li, C., Zheng, A., He, R., & Tang, J. (2022). Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2633-2641. https://doi.org/10.1609/aaai.v36i3.20165

Issue

Section

AAAI Technical Track on Computer Vision III