Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models

Jiahuan Long; Tingsong Jiang; Wen Yao; Yizhe Xiong; Zhengqin Xu; Shuai Jia; Hanqing Liu; Chao Ma

doi:10.1609/aaai.v40i28.39581

Authors

Jiahuan Long MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiaotong University Defense Innovation Institute, Chinese Academy of Military Science
Tingsong Jiang Defense Innovation Institute, Chinese Academy of Military Science
Wen Yao Defense Innovation Institute, Chinese Academy of Military Science
Yizhe Xiong School of Software, Tsinghua University
Zhengqin Xu MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiaotong University
Shuai Jia MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiaotong University
Hanqing Liu MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiaotong University Defense Innovation Institute, Chinese Academy of Military Science
Chao Ma MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiaotong University

DOI:

https://doi.org/10.1609/aaai.v40i28.39581

Abstract

Vision foundation models (VFMs) have demonstrated remarkable capabilities in learning universal visual representations. However, adapting these models to downstream tasks conventionally requires parameter updates, with even parameter-efficient fine-tuning methods necessitating the modification of thousands to millions of weights. In this paper, we investigate the redundancies in the segment anything model (SAM) and then propose a novel parameter-free fine-tuning method. Unlike traditional fine-tuning methods that adjust parameters, our method emphasizes selecting, reusing, and enhancing pre-trained features, offering a new perspective on fine-tuning foundation models. Specifically, we introduce a channel selection algorithm based on the model's output difference to identify redundant and effective channels. By selectively replacing the redundant channels with more effective ones, we filter out less useful features and reuse more task-irrelevant features to downstream tasks, thereby enhancing the task-specific feature representation. Experiments on both out-of-domain and in-domain datasets demonstrate the efficiency and effectiveness of our method in different vision tasks (e.g., image segmentation, depth estimation and image classification). Notably, our approach can seamlessly integrate with existing fine-tuning strategies (e.g., LoRA, Adapter), further boosting the performance of already fine-tuned models. Moreover, since our channel selection involves only model inference, our method significantly reduces GPU memory overhead.

Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information