Chen, Chen, Yuchen Hu, Qiang Zhang, Heqing Zou, Beier Zhu, and Eng Siong Chng. “Leveraging Modality-Specific Representations for Audio-Visual Speech Recognition via Reinforcement Learning”. Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 12607-12615. Accessed February 24, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/26484.