Gradient Weight-normalized Low-rank Projection for Efficient LLM Training
DOI:
https://doi.org/10.1609/aaai.v39i23.34587Abstract
Large Language Models (LLMs) have shown remarkable performance across various tasks, but the escalating demands on computational resources pose significant challenges, particularly in the extensive utilization of full fine-tuning for downstream tasks. To address this, parameter-efficient fine-tuning (PEFT) methods have been developed, but they often underperform compared to full fine-tuning and struggle with memory efficiency. In this work, we introduce Gradient Weight-Normalized Low-Rank Projection (GradNormLoRP), a novel approach that enhances both parameter and memory efficiency while maintaining comparable performance to full fine-tuning. GradNormLoRP normalizes the weight matrix to improve gradient conditioning, facilitating better convergence during optimization. Additionally, it applies low-rank approximations to the weight and gradient matrices, significantly reducing memory usage during training. Extensive experiments demonstrate that our 8-bit GradNormLoRP reduces optimizer memory usage by up to 89.5\% and enables the pre-training of large LLMs, such as LLaMA 7B, on consumer-level GPUs like the NVIDIA RTX 4090, without additional inference costs. Moreover, GradNormLoRP outperforms existing low-rank methods in fine-tuning tasks. For instance, when fine-tuning the RoBERTa model on all GLUE tasks with a rank of 8, GradNormLoRP achieves an average score of 80.65, surpassing LoRA's score of 79.23. These results underscore GradNormLoRP as a promising alternative for efficient LLM pre-training and fine-tuning.Downloads
Published
2025-04-11
How to Cite
Huang, J.-H., Shen, Y., Zhu, H., Rudinac, S., & Kanoulas, E. (2025). Gradient Weight-normalized Low-rank Projection for Efficient LLM Training. Proceedings of the AAAI Conference on Artificial Intelligence, 39(23), 24123–24131. https://doi.org/10.1609/aaai.v39i23.34587
Issue
Section
AAAI Technical Track on Natural Language Processing II