Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification
Keywords:Computer Vision (CV)
AbstractMulti-modal person Re-ID introduces more complementary information to assist the traditional Re-ID task. Existing multi-modal methods ignore the importance of modality-specific information in the feature fusion stage. To this end, we propose a novel method to boost modality-specific representations for multi-modal person Re-ID: Interact, Embed, and EnlargE (IEEE). First, we propose a cross-modal interacting module to exchange useful information between different modalities in the feature extraction phase. Second, we propose a relation-based embedding module to enhance the richness of feature descriptors by embedding the global feature into the fine-grained local information. Finally, we propose multi-modal margin loss to force the network to learn modality-specific information for each modality by enlarging the intra-class discrepancy. Superior performance on multi-modal Re-ID dataset RGBNT201 and three constructed Re-ID datasets validate the effectiveness of the proposed method compared with the state-of-the-art approaches.
How to Cite
Wang, Z., Li, C., Zheng, A., He, R., & Tang, J. (2022). Interact, Embed, and EnlargE: Boosting Modality-Specific Representations for Multi-Modal Person Re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2633-2641. https://doi.org/10.1609/aaai.v36i3.20165
AAAI Technical Track on Computer Vision III