[1]

C. Chen, Y. Hu, Q. Zhang, H. Zou, B. Zhu, and E. S. Chng, “Leveraging Modality-Specific Representations for Audio-Visual Speech Recognition via Reinforcement Learning”, AAAI, vol. 37, no. 11, pp. 12607-12615, Jun. 2023.