A Better Start: Sensitivity-Aware Warm-Up for Robust and Efficient Fine-Tuning

Authors

  • Yile Chen South China University of Technology Hong Kong University of Science and Technology (Guangzhou)
  • Zeyi Wen Hong Kong University of Science and Technology (Guangzhou) Hong Kong University of Science and Technology
  • Jian Chen South China University of Technology
  • Jin Huang South China Normal University

DOI:

https://doi.org/10.1609/aaai.v40i36.40284

Abstract

As an essential component of fine-tuning, warm-up plays a crucial role in promoting stability and generalization. Many studies have examined its underlying mechanisms from different aspects. However, most of the studies focus on incorporating these insights into optimizers to reduce the reliance on warm-up. Little attention has been paid to addressing the inherent limitations of the warm-up itself, which restricts its effectiveness. In this work, we revisit warm-up from a loss landscape perspective and identify several limitations with existing warm-up, including: (1) susceptibility to nearby suboptimal traps, (2) sensitivity to hyperparameters and random seeds, and (3) inefficiency during the early stages of training. To overcome these limitations, we propose Sensitivity-Aware Warm-Up (SAWU), a lightweight and adaptive strategy that dynamically leverages learning sensitivity during warm-up to guide updates toward better and more stable basins. In addition, SAWU also introduces an adaptive scheduling mechanism and phase transition strategy across warm-up, stable, and decay phases to further enhance robustness and efficiency. Extensive experiments on various downstream tasks show that SAWU significantly outperforms the vanilla method (e.g., average 3.43% improvement on RoBerta). Moreover, SAWU can be easily combined with various optimizers and remains effective even when warm-up-based methods fail (e.g, it lifts RAdam from 49.46% to 91.78% on qnli. Thanks to its lightweight nature, SAWU introduces minimal overhead and even reduces training time by over 5% compared to other methods.

Downloads

Published

2026-03-14

How to Cite

Chen, Y., Wen, Z., Chen, J., & Huang, J. (2026). A Better Start: Sensitivity-Aware Warm-Up for Robust and Efficient Fine-Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(36), 30324-30331. https://doi.org/10.1609/aaai.v40i36.40284

Issue

Section

AAAI Technical Track on Natural Language Processing I