[1]

R.-C. Zheng, “Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding”, AAAI, vol. 40, no. 41, pp. 35021–35029, Mar. 2026.