SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization

Authors

  • Zhixiong Zhao Shanghai Jiao Tong University Nanyang Technological University
  • Fangxin Liu Shanghai Jiao Tong University Shanghai Qi Zhi Institute
  • Junjie Wang Shanghai Jiao Tong University Shanghai Qi Zhi Institute
  • Chenyang Guan Shanghai Jiao Tong University
  • Zongwu Wang Shanghai Jiao Tong University Shanghai Qi Zhi Institute
  • Li Jiang Shanghai Jiao Tong University Shanghai Qi Zhi Institute
  • Haibing Guan Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v40i34.40112

Abstract

The emergence of accurate open large language models (LLMs) has sparked a push for advanced quantization techniques to enable efficient deployment on end-user devices. In this paper, we revisit the challenge of extreme LLM compression---targeting ultra-low-bit quantization for both activations and weights---from a Fourier frequency domain perspective. We propose SpecQuant, a two-stage framework that tackles activation outliers and cross-channel variance. In the first stage, activation outliers are smoothed and transferred into the weight matrix to simplify downstream quantization. In the second stage, we apply channel-wise low-frequency Fourier truncation to suppress high-frequency components while preserving essential signal energy, improving quantization robustness. Our method builds on the principle that most of the weight energy is concentrated in low-frequency components, which can be retained with minimal impact on model accuracy. To enable runtime adaptability, we introduce a lightweight truncation module during inference that adjusts truncation thresholds based on channel characteristics. On LLaMA-3 8B, SpecQuant achieves 4-bit quantization for both weights and activations, narrowing the zero-shot accuracy gap to only 1.5% compared to full precision, while delivering 2× faster inference and 3× lower memory usage.

Downloads

Published

2026-03-14

How to Cite

Zhao, Z., Liu, F., Wang, J., Guan, C., Wang, Z., Jiang, L., & Guan, H. (2026). SpecQuant: Spectral Decomposition and Adaptive Truncation for Ultra-Low-Bit LLMs Quantization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 28786–28794. https://doi.org/10.1609/aaai.v40i34.40112

Issue

Section

AAAI Technical Track on Machine Learning XI