Parameter-Efficient Model Adaptation for Vision Transformers

Authors

  • Xuehai He University of California, Santa Cruz
  • Chunyuan Li Microsoft Research
  • Pengchuan Zhang Microsoft Research
  • Jianwei Yang Microsoft Research
  • Xin Eric Wang University of California, Santa Cruz

DOI:

https://doi.org/10.1609/aaai.v37i1.25160

Keywords:

CV: Learning & Optimization for CV, CV: Language and Vision, CV: Representation Learning for Vision, ML: Transfer, Domain Adaptation, Multi-Task Learning

Abstract

In computer vision, it has achieved great transfer learning performance via adapting large-scale pretrained vision models (e.g., vision transformers) to downstream tasks. Common approaches for model adaptation either update all model parameters or leverage linear probes. In this paper, we aim to study parameter-efficient model adaptation strategies for vision transformers on the image classification task. We formulate efficient model adaptation as a subspace training problem and perform a comprehensive benchmarking over different efficient adaptation methods. We conduct an empirical study on each efficient model adaptation method focusing on its performance alongside parameter cost. Furthermore, we propose a parameter-efficient model adaptation framework, which first selects submodules by measuring local intrinsic dimensions and then projects them into subspace for further decomposition via a novel Kronecker Adaptation method. We analyze and compare our method with a diverse set of baseline model adaptation methods (including state-of-the-art methods for pretrained language models). Our method performs the best in terms of the tradeoff between accuracy and parameter efficiency across 20 datasets under the few-shot setting and 7 image classification datasets under the full-shot setting.

Downloads

Published

2023-06-26

How to Cite

He, X., Li, C., Zhang, P., Yang, J., & Wang, X. E. (2023). Parameter-Efficient Model Adaptation for Vision Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 817-825. https://doi.org/10.1609/aaai.v37i1.25160

Issue

Section

AAAI Technical Track on Computer Vision I