Du, Chenpeng, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, and Kai Yu. “UniCATS: A Unified Context-Aware Text-to-Speech Framework With Contextual VQ-Diffusion and Vocoding”. Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (March 24, 2024): 17924–17932. Accessed May 27, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/29747.