LION: Implicit Vision Prompt Tuning

Authors

  • Haixin Wang National Engineering Research Center for Software Engineering, Peking University
  • Jianlong Chang Huawei Cloud & AI
  • Yihang Zhai National Engineering Research Center for Software Engineering, Peking University
  • Xiao Luo School of Mathematical Sciences, Peking University
  • Jinan Sun National Engineering Research Center for Software Engineering, Peking University
  • Zhouchen Lin National Key Lab of General AI, School of Intelligence Science and Technology, Peking University Peng Cheng Laboratory
  • Qi Tian Huawei Cloud & AI

DOI:

https://doi.org/10.1609/aaai.v38i6.28345

Keywords:

CV: Representation Learning for Vision

Abstract

Despite recent promising performances across a range of vision tasks, vision Transformers still have an issue of high computational costs. Recently, vision prompt learning has provided an economical solution to this problem without fine-tuning the whole large-scale model. However, the efficiency and effectiveness of existing models are still far from satisfactory due to the parameter cost of extensive prompt blocks and tricky prompt framework designs. In this paper, we propose a light-weight prompt framework named impLicit vIsion prOmpt tuNing (LION), which is motivated by deep implicit models with stable low memory costs for various complex tasks. In particular, we merely insect two equilibrium implicit layers in two ends of the pre-trained backbone with parameters frozen. Moreover, according to the lottery hypothesis, we further prune the parameters to relieve the computation burden in implicit layers. Various experiments have validated that our LION obtains promising performances on a wide range of datasets. Most importantly, LION reduces up to 11.5 % of training parameter numbers while obtaining higher performance than the state-of-the-art VPT, especially under challenging scenes. Furthermore, we find that our proposed LION has an excellent generalization performance, making it an easy way to boost transfer learning in the future.

Downloads

Published

2024-03-24

How to Cite

Wang, H., Chang, J., Zhai, Y., Luo, X., Sun, J., Lin, Z., & Tian, Q. (2024). LION: Implicit Vision Prompt Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5372-5380. https://doi.org/10.1609/aaai.v38i6.28345

Issue

Section

AAAI Technical Track on Computer Vision V