APEX-Q: Arbitrary-dimension Product-EXtension Quantization for Accelerated LLM Deployment (Student Abstract)
DOI:
https://doi.org/10.1609/aaai.v40i48.42293Abstract
We present APEX-Q, a flexible product quantization framework for compressing large language models. Unlike prior multi-codebook quantization methods with fixed partitions, APEX-Q supports arbitrary-dimensional tensor quantization, better capturing weight redundancy. It achieves performance on par with 4-bit and 8-bit baselines, enables post-training quantization without retraining, and reveals key trade-offs across subvector dimensions, codebook sizes, and hardware efficiency. APEX-Q thus provides a unified, hardware-friendly approach to scalable LLM deployment.Downloads
Published
2026-03-14
How to Cite
Wang, Y., Qiao, Y., Huang, S., & Kwon, H. (2026). APEX-Q: Arbitrary-dimension Product-EXtension Quantization for Accelerated LLM Deployment (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41424–41426. https://doi.org/10.1609/aaai.v40i48.42293
Issue
Section
AAAI Student Abstract and Poster Program