PEFT-BoA: Parameter-Efficient Fine-Tuning with Bag-of-Adapters for Multi-Modal Object Re-identification

Authors

  • Hongchao Li Anhui Normal University
  • Guangxing Liu Anhui Normal University
  • Xixi Wang Anhui University
  • Baihe Liang Anhui Normal University
  • YongLong Luo Anhui Normal University

DOI:

https://doi.org/10.1609/aaai.v40i8.37537

Abstract

Multi-modal object Re-identification (ReID) aims to retrieve individuals by leveraging complementary information from different modalities. Recent CLIP-based approaches show promising results, but they usually employ prompt-based or hybrid prompt-adapter tuning and still face the problems of heterogeneous domain gap, fine-grained identity discrimination and noise instance interference. To address these problems, we introduce a novel Parameter-Efficient Fine-Tuning framework with Bag-of-Adapters (PEFT-BoA) based on the pre-trained CLIP's vision encoder for multi-modal object ReID. Specifically, we first propose a Domain-specific Patch Adapter (DPA) designed to bridge the visual feature gap between pre-trained and fine-tuned models at the local patch level. Meanwhile, we propose a Task-specific Class Adapter (TCA) enhance the fine-grained identity discrimination ability by optimizing global class token. Finally, we propose an Instance-specific Fusion Adapter (IFA) dynamically selects and combines only the most useful features across different modalities for each instance. Our PEFT-BoA achieves the better performance on multi-modal object re-identification benchmarks, while maintaining fewer trainable parameters (6.62M) and a higher training throughput (246.2fps).

Downloads

Published

2026-03-14

How to Cite

Li, H., Liu, G., Wang, X., Liang, B., & Luo, Y. (2026). PEFT-BoA: Parameter-Efficient Fine-Tuning with Bag-of-Adapters for Multi-Modal Object Re-identification. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6127–6135. https://doi.org/10.1609/aaai.v40i8.37537

Issue

Section

AAAI Technical Track on Computer Vision V