MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

Authors

  • Yuhao Wang School of Future Technology, School of Artificial Intelligence, Dalian University of Technology
  • Xuehu Liu School of Computer Science and Artificial Intelligence, Wuhan University of Technology
  • Tianyu Yan School of Future Technology, School of Artificial Intelligence, Dalian University of Technology
  • Yang Liu School of Future Technology, School of Artificial Intelligence, Dalian University of Technology Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University
  • Aihua Zheng School of Artificial Intelligence, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University
  • Pingping Zhang School of Future Technology, School of Artificial Intelligence, Dalian University of Technology Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University
  • Huchuan Lu School of Future Technology, School of Artificial Intelligence, Dalian University of Technology

DOI:

https://doi.org/10.1609/aaai.v39i8.32879

Abstract

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal ReID tasks. However, they remain unexplored for multi-modal object ReID. Furthermore, current multi-modal aggregation methods have obvious limitations in dealing with long sequences from different modalities. To address above issues, we introduce a novel framework called MambaPro for multi-modal object ReID. To be specific, we first employ a Parallel Feed-Forward Adapter (PFA) for adapting CLIP to multi-modal object ReID. Then, we propose the Synergistic Residual Prompt (SRP) to guide the joint learning of multi-modal features. Finally, leveraging Mamba's superior scalability for long sequences, we introduce Mamba Aggregation (MA) to efficiently model interactions between different modalities. As a result, MambaPro could extract more robust features with lower complexity. Extensive experiments on three multi-modal object ReID benchmarks (i.e., RGBNT201, RGBNT100 and MSVR310) validate the effectiveness of our proposed methods.

Downloads

Published

2025-04-11

How to Cite

Wang, Y., Liu, X., Yan, T., Liu, Y., Zheng, A., Zhang, P., & Lu, H. (2025). MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt. Proceedings of the AAAI Conference on Artificial Intelligence, 39(8), 8150–8158. https://doi.org/10.1609/aaai.v39i8.32879

Issue

Section

AAAI Technical Track on Computer Vision VII