APEX-Q: Arbitrary-dimension Product-EXtension Quantization for Accelerated LLM Deployment (Student Abstract)

Authors

  • Yian Wang University of California, Irvine
  • Ye Qiao University of California, Irvine
  • Sitao Huang University of California, Irvine
  • Hyoukjun Kwon University of California, Irvine

DOI:

https://doi.org/10.1609/aaai.v40i48.42293

Abstract

We present APEX-Q, a flexible product quantization framework for compressing large language models. Unlike prior multi-codebook quantization methods with fixed partitions, APEX-Q supports arbitrary-dimensional tensor quantization, better capturing weight redundancy. It achieves performance on par with 4-bit and 8-bit baselines, enables post-training quantization without retraining, and reveals key trade-offs across subvector dimensions, codebook sizes, and hardware efficiency. APEX-Q thus provides a unified, hardware-friendly approach to scalable LLM deployment.

Published

2026-03-14

How to Cite

Wang, Y., Qiao, Y., Huang, S., & Kwon, H. (2026). APEX-Q: Arbitrary-dimension Product-EXtension Quantization for Accelerated LLM Deployment (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41424–41426. https://doi.org/10.1609/aaai.v40i48.42293