TTE: Two Tokens Are Enough to Improve Parameter-Efficient Tuning

Jiacheng Ruan; Mingye Xie; Jingsheng Gao; Xian Gao; Suncheng Xiang; Ting Liu; Yuzhuo Fu

doi:10.1609/aaai.v39i19.34226

Authors

Jiacheng Ruan Shanghai Jiao Tong University, China
Mingye Xie Shanghai Jiao Tong University, China
Jingsheng Gao Shanghai Jiao Tong University, China
Xian Gao Shanghai Jiao Tong University, China
Suncheng Xiang Shanghai Jiao Tong University, China
Ting Liu Shanghai Jiao Tong University, China
Yuzhuo Fu Shanghai Jiao Tong University, China

DOI:

https://doi.org/10.1609/aaai.v39i19.34226

Abstract

Existing fine-tuning paradigms are predominantly characterized by Full Parameter Tuning (FPT) and Parameter-Efficient Tuning (PET). FPT fine-tunes all parameters of a pre-trained model on downstream tasks, whereas PET freezes the pre-trained model and employs only a minimal number of learnable parameters for fine-tuning. However, both approaches face issues of overfitting, especially in scenarios where downstream samples are limited. This issue has been thoroughly explored in FPT, but less so in PET. To this end, this paper investigates overfitting in PET, representing a pioneering study in the field. Specifically, across 19 image classification datasets, we employ three classic PET methods (e.g., VPT, Adapter/Adaptformer, and LoRA) and explore various regularization techniques to mitigate overfitting. Regrettably, the results suggest that existing regularization techniques are incompatible with the PET process and may even lead to performance degradation. Consequently, we introduce a new framework named TTE (Two Tokens are Enough), which effectively alleviates overfitting in PET through a novel constraint function based on the learnable tokens. Experiments conducted on 24 datasets across image and few-shot classification tasks demonstrate that our fine-tuning framework not only mitigates overfitting but also significantly enhances PET's performance. Notably, our TTE framework surpasses the highest-performing FPT framework (DR-Tune), utilizing significantly fewer parameters (0.15M vs. 85.84M) and achieving an improvement of 1%.

TTE: Two Tokens Are Enough to Improve Parameter-Efficient Tuning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information