Treasures in Discarded Weights for LLM Quantization

Authors

  • Hao Yu National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University Alibaba Cloud Computing
  • Yang Zhou Alibaba Cloud Computing
  • Bohua Chen Alibaba Cloud Computing
  • Zelan Yang Alibaba Cloud Computing
  • Shen Li Alibaba Cloud Computing
  • Yong Li Alibaba Cloud Computing
  • Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University

DOI:

https://doi.org/10.1609/aaai.v39i21.34376

Abstract

In recent years, large language models (LLMs) have developed rapidly and revolutionized natural language processing. However, high storage overhead and computing costs limit LLM deployment in resource-constrained environments. Quantization algorithms can effectively compress LLMs and accelerate inference, but they lead to loss in precision, especially in low-bit scenarios. In this paper, we find that the discarded weight values caused by quantization in fact contain treasures to improve LLMs' accuracy. To excavate those hidden treasures, we construct search spaces around these discarded weights and those weights within the search space can seamlessly be incorporated into the original quantization weights. To determine which weights should be merged, we design a plug-and-play weight compensation framework to capture global information and keep the weights with the highest potential benefits. Our framework can be combined with various LLM quantization algorithms to achieve higher precision without additional inference overhead. We validate the effectiveness of our approach on widely used benchmark datasets for LLMs.

Published

2025-04-11

How to Cite

Yu, H., Zhou, Y., Chen, B., Yang, Z., Li, S., Li, Y., & Wu, J. (2025). Treasures in Discarded Weights for LLM Quantization. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 22218–22226. https://doi.org/10.1609/aaai.v39i21.34376

Issue

Section

AAAI Technical Track on Machine Learning VII