LION: Implicit Vision Prompt Tuning
DOI:
https://doi.org/10.1609/aaai.v38i6.28345Keywords:
CV: Representation Learning for VisionAbstract
Despite recent promising performances across a range of vision tasks, vision Transformers still have an issue of high computational costs. Recently, vision prompt learning has provided an economical solution to this problem without fine-tuning the whole large-scale model. However, the efficiency and effectiveness of existing models are still far from satisfactory due to the parameter cost of extensive prompt blocks and tricky prompt framework designs. In this paper, we propose a light-weight prompt framework named impLicit vIsion prOmpt tuNing (LION), which is motivated by deep implicit models with stable low memory costs for various complex tasks. In particular, we merely insect two equilibrium implicit layers in two ends of the pre-trained backbone with parameters frozen. Moreover, according to the lottery hypothesis, we further prune the parameters to relieve the computation burden in implicit layers. Various experiments have validated that our LION obtains promising performances on a wide range of datasets. Most importantly, LION reduces up to 11.5 % of training parameter numbers while obtaining higher performance than the state-of-the-art VPT, especially under challenging scenes. Furthermore, we find that our proposed LION has an excellent generalization performance, making it an easy way to boost transfer learning in the future.Downloads
Published
2024-03-24
How to Cite
Wang, H., Chang, J., Zhai, Y., Luo, X., Sun, J., Lin, Z., & Tian, Q. (2024). LION: Implicit Vision Prompt Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5372-5380. https://doi.org/10.1609/aaai.v38i6.28345
Issue
Section
AAAI Technical Track on Computer Vision V