Yang, Chih-Chun, Wan-Cyuan Fan, Cheng-Fu Yang, and Yu-Chiang Frank Wang. “Cross-Modal Mutual Learning for Audio-Visual Speech Recognition and Manipulation”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3036-3044. Accessed August 7, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/20210.