Towards Efficient Low-Order Hybrid Optimizer for Language Model Fine-Tuning
DOI:
https://doi.org/10.1609/aaai.v39i22.34530Abstract
As the size of language models notably grows, fine-tuning the models becomes more challenging: fine-tuning with first-order optimizers (e.g., SGD and Adam) requires high memory consumption, while fine-tuning with a memory-efficient zeroth-order optimizer (MeZO) has a significant accuracy drop and slower convergence rate. In this work, we propose a Low order Hybrid Optimizer (LoHO) which merges zeroth-order (ZO) and first-order (FO) optimizers for fine-tuning. LoHO is empowered with inter-layer hybrid optimization and intra-layer hybrid optimization, which boosts the accuracy of MeZO while keeping memory usage within a budget. The inter-layer hybrid optimization exploits the FO optimizer in deep layers and the ZO optimizer in shallow ones, therefore avoiding unnecessary gradient propagation to improve memory efficiency. The intra-layer hybrid optimization updates a proportion of parameters in a layer by the ZO optimizer, and the rest by the FO optimizer, taking advantage of gradient sparsity for high efficiency implementation. Our experimental results across common datasets on different pre-trained backbones (i.e., RoBERTa-large, OPT-13B and OPT-30B) demonstrate that LoHO can significantly improve the predictive accuracy and convergence rate of MeZO, while controlling the memory footprint during fine-tuning. Moreover, LoHO can achieve comparable performance with first-order fine-tuning using substantially fewer memory resources.Published
2025-04-11
How to Cite
Chen, M., Huang, Y.-L., & Wen, Z. (2025). Towards Efficient Low-Order Hybrid Optimizer for Language Model Fine-Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(22), 23605–23613. https://doi.org/10.1609/aaai.v39i22.34530
Issue
Section
AAAI Technical Track on Natural Language Processing I