RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy

Authors

  • Geonho Lee Hanyang University
  • Janghwan Lee Hanyang University
  • Sukjin Hong KT Corporation
  • Minsoo Kim Hanyang University
  • Euijai Ahn KT Corporation
  • Du-Seong Chang Sogang University
  • Jungwook Choi Hanyang University

DOI:

https://doi.org/10.1609/aaai.v39i17.33990

Abstract

Low-rank adaptation (LoRA) has become the dominant method for parameter-efficient LLM fine-tuning, with LoRA-based quantization error compensation (LQEC) emerging as a powerful tool for recovering accuracy in compressed LLMs. However, LQEC has underperformed in sub-4-bit scenarios, with no prior investigation into understanding this limitation. We propose RILQ (Rank-Insensitive LoRA-based Quantization Error Compensation) to boost 2-bit LLM accuracy. Based on rank analysis revealing model-wise activation discrepancy loss's rank-insensitive nature, RILQ employs this loss to adjust adapters cooperatively across layers, enabling robust error compensation with low-rank adapters. Evaluations on LLaMA-2 and LLaMA-3 demonstrate RILQ's consistent improvements in 2-bit quantized inference across various state-of-the-art quantizers and enhanced accuracy in task-specific fine-tuning. RILQ maintains computational efficiency comparable to existing LoRA methods, enabling adapter-merged weight-quantized LLM inference with significantly enhanced accuracy, making it a promising approach for boosting 2-bit LLM performance.

Published

2025-04-11

How to Cite

Lee, G., Lee, J., Hong, S., Kim, M., Ahn, E., Chang, D.-S., & Choi, J. (2025). RILQ: Rank-Insensitive LoRA-Based Quantization Error Compensation for Boosting 2-Bit Large Language Model Accuracy. Proceedings of the AAAI Conference on Artificial Intelligence, 39(17), 18091-18100. https://doi.org/10.1609/aaai.v39i17.33990

Issue

Section

AAAI Technical Track on Machine Learning III