Treasures in Discarded Weights for LLM Quantization

Hao Yu; Yang Zhou; Bohua Chen; Zelan Yang; Shen Li; Yong Li; Jianxin Wu

doi:10.1609/aaai.v39i21.34376

Authors

Hao Yu National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University Alibaba Cloud Computing
Yang Zhou Alibaba Cloud Computing
Bohua Chen Alibaba Cloud Computing
Zelan Yang Alibaba Cloud Computing
Shen Li Alibaba Cloud Computing
Yong Li Alibaba Cloud Computing
Jianxin Wu National Key Laboratory for Novel Software Technology, Nanjing University School of Artificial Intelligence, Nanjing University

DOI:

https://doi.org/10.1609/aaai.v39i21.34376

Abstract

In recent years, large language models (LLMs) have developed rapidly and revolutionized natural language processing. However, high storage overhead and computing costs limit LLM deployment in resource-constrained environments. Quantization algorithms can effectively compress LLMs and accelerate inference, but they lead to loss in precision, especially in low-bit scenarios. In this paper, we find that the discarded weight values caused by quantization in fact contain treasures to improve LLMs' accuracy. To excavate those hidden treasures, we construct search spaces around these discarded weights and those weights within the search space can seamlessly be incorporated into the original quantization weights. To determine which weights should be merged, we design a plug-and-play weight compensation framework to capture global information and keep the weights with the highest potential benefits. Our framework can be combined with various LLM quantization algorithms to achieve higher precision without additional inference overhead. We validate the effectiveness of our approach on widely used benchmark datasets for LLMs.

Treasures in Discarded Weights for LLM Quantization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information