Zheng, R.-C., Liu, W., Du, H.-P., Zhang, Q., Deng, C., Chen, Q., … Ling, Z.-H. (2026). Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 35021–35029. https://doi.org/10.1609/aaai.v40i41.40807