Yu, X., Fang, Y., Wu, P., Ye, G., Zhou, W., Zhang, W., & Xiao, S. (2026). MF-Speech: Achieving Fine-Grained and Compositional Control in Speech Generation via Factor Disentanglement. Proceedings of the AAAI Conference on Artificial Intelligence, 40(21), 17966–17974. https://doi.org/10.1609/aaai.v40i21.38856