SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization

Zhixiong Zhao; Fangxin Liu; Junjie Wang; Chenyang Guan; Zongwu Wang; Li Jiang; Haibing Guan

doi:10.1609/aaai.v40i34.40112

Authors

Zhixiong Zhao Shanghai Jiao Tong University Nanyang Technological University
Fangxin Liu Shanghai Jiao Tong University Shanghai Qi Zhi Institute
Junjie Wang Shanghai Jiao Tong University Shanghai Qi Zhi Institute
Chenyang Guan Shanghai Jiao Tong University
Zongwu Wang Shanghai Jiao Tong University Shanghai Qi Zhi Institute
Li Jiang Shanghai Jiao Tong University Shanghai Qi Zhi Institute
Haibing Guan Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v40i34.40112

Abstract

The emergence of accurate open large language models (LLMs) has sparked a push for advanced quantization techniques to enable efficient deployment on end-user devices. In this paper, we revisit the challenge of extreme LLM compression---targeting ultra-low-bit quantization for both activations and weights---from a Fourier frequency domain perspective. We propose SpecQuant, a two-stage framework that tackles activation outliers and cross-channel variance. In the first stage, activation outliers are smoothed and transferred into the weight matrix to simplify downstream quantization. In the second stage, we apply channel-wise low-frequency Fourier truncation to suppress high-frequency components while preserving essential signal energy, improving quantization robustness. Our method builds on the principle that most of the weight energy is concentrated in low-frequency components, which can be retained with minimal impact on model accuracy. To enable runtime adaptability, we introduce a lightweight truncation module during inference that adjusts truncation thresholds based on channel characteristics. On LLaMA-3 8B, SpecQuant achieves 4-bit quantization for both weights and activations, narrowing the zero-shot accuracy gap to only 1.5% compared to full precision, while delivering 2× faster inference and 3× lower memory usage.

SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information