Guan, W., Li, Y., Li, T., Huang, H., Wang, F., Lin, J., Huang, L., Li, L., & Hong, Q. (2024). MM-TTS: Multi-Modal Prompt Based Style Transfer for Expressive Text-to-Speech Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18117-18125. https://doi.org/10.1609/aaai.v38i16.29769