MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

Yuhao Wang; Xuehu Liu; Tianyu Yan; Yang Liu; Aihua Zheng; Pingping Zhang; Huchuan Lu

doi:10.1609/aaai.v39i8.32879

Authors

Yuhao Wang School of Future Technology, School of Artificial Intelligence, Dalian University of Technology
Xuehu Liu School of Computer Science and Artificial Intelligence, Wuhan University of Technology
Tianyu Yan School of Future Technology, School of Artificial Intelligence, Dalian University of Technology
Yang Liu School of Future Technology, School of Artificial Intelligence, Dalian University of Technology Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University
Aihua Zheng School of Artificial Intelligence, Anhui University Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University
Pingping Zhang School of Future Technology, School of Artificial Intelligence, Dalian University of Technology Anhui Provincial Key Laboratory of Multimodal Cognitive Computation, Anhui University
Huchuan Lu School of Future Technology, School of Artificial Intelligence, Dalian University of Technology

DOI:

https://doi.org/10.1609/aaai.v39i8.32879

Abstract

Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities. Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal ReID tasks. However, they remain unexplored for multi-modal object ReID. Furthermore, current multi-modal aggregation methods have obvious limitations in dealing with long sequences from different modalities. To address above issues, we introduce a novel framework called MambaPro for multi-modal object ReID. To be specific, we first employ a Parallel Feed-Forward Adapter (PFA) for adapting CLIP to multi-modal object ReID. Then, we propose the Synergistic Residual Prompt (SRP) to guide the joint learning of multi-modal features. Finally, leveraging Mamba's superior scalability for long sequences, we introduce Mamba Aggregation (MA) to efficiently model interactions between different modalities. As a result, MambaPro could extract more robust features with lower complexity. Extensive experiments on three multi-modal object ReID benchmarks (i.e., RGBNT201, RGBNT100 and MSVR310) validate the effectiveness of our proposed methods.

MambaPro: Multi-Modal Object Re-identification with Mamba Aggregation and Synergistic Prompt

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information