SAM-PARSER: Fine-Tuning SAM Efficiently by Parameter Space Reconstruction

Authors

  • Zelin Peng Shanghai Jiao Tong University
  • Zhengqin Xu Shanghai Jiao Tong University
  • Zhilin Zeng Shanghai Jiao Tong University
  • Xiaokang Yang Shanghai Jiao Tong University
  • Wei Shen Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v38i5.28250

Keywords:

CV: Segmentation

Abstract

Segment Anything Model (SAM) has received remarkable attention as it offers a powerful and versatile solution for object segmentation in images. However, fine-tuning SAM for downstream segmentation tasks under different scenarios remains a challenge, as the varied characteristics of different scenarios naturally requires diverse model parameter spaces. Most existing fine-tuning methods attempt to bridge the gaps among different scenarios by introducing a set of new parameters to modify SAM's original parameter space. Unlike these works, in this paper, we propose fine-tuning SAM efficiently by parameter space reconstruction (SAM-PARSER), which introduce nearly zero trainable parameters during fine-tuning. In SAM-PARSER, we assume that SAM's original parameter space is relatively complete, so that its bases are able to reconstruct the parameter space of a new scenario. We obtain the bases by matrix decomposition, and fine-tuning the coefficients to reconstruct the parameter space tailored to the new scenario by an optimal linear combination of the bases. Experimental results show that SAM-PARSER exhibits superior segmentation performance across various scenarios, while reducing the number of trainable parameters by approximately 290 times compared with current parameter-efficient fine-tuning methods.

Downloads

Published

2024-03-24

How to Cite

Peng, Z., Xu, Z., Zeng, Z., Yang, X., & Shen, W. (2024). SAM-PARSER: Fine-Tuning SAM Efficiently by Parameter Space Reconstruction. Proceedings of the AAAI Conference on Artificial Intelligence, 38(5), 4515-4523. https://doi.org/10.1609/aaai.v38i5.28250

Issue

Section

AAAI Technical Track on Computer Vision IV