A Better Start: Sensitivity-Aware Warm-Up for Robust and Efficient Fine-Tuning

Yile Chen; Zeyi Wen; Jian Chen; Jin Huang

doi:10.1609/aaai.v40i36.40284

Authors

Yile Chen South China University of Technology Hong Kong University of Science and Technology (Guangzhou)
Zeyi Wen Hong Kong University of Science and Technology (Guangzhou) Hong Kong University of Science and Technology
Jian Chen South China University of Technology
Jin Huang South China Normal University

DOI:

https://doi.org/10.1609/aaai.v40i36.40284

Abstract

As an essential component of fine-tuning, warm-up plays a crucial role in promoting stability and generalization. Many studies have examined its underlying mechanisms from different aspects. However, most of the studies focus on incorporating these insights into optimizers to reduce the reliance on warm-up. Little attention has been paid to addressing the inherent limitations of the warm-up itself, which restricts its effectiveness. In this work, we revisit warm-up from a loss landscape perspective and identify several limitations with existing warm-up, including: (1) susceptibility to nearby suboptimal traps, (2) sensitivity to hyperparameters and random seeds, and (3) inefficiency during the early stages of training. To overcome these limitations, we propose Sensitivity-Aware Warm-Up (SAWU), a lightweight and adaptive strategy that dynamically leverages learning sensitivity during warm-up to guide updates toward better and more stable basins. In addition, SAWU also introduces an adaptive scheduling mechanism and phase transition strategy across warm-up, stable, and decay phases to further enhance robustness and efficiency. Extensive experiments on various downstream tasks show that SAWU significantly outperforms the vanilla method (e.g., average 3.43% improvement on RoBerta). Moreover, SAWU can be easily combined with various optimizers and remains effective even when warm-up-based methods fail (e.g, it lifts RAdam from 49.46% to 91.78% on qnli. Thanks to its lightweight nature, SAWU introduces minimal overhead and even reduces training time by over 5% compared to other methods.

A Better Start: Sensitivity-Aware Warm-Up for Robust and Efficient Fine-Tuning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information