ZHU, Qiushi; ZHANG, Jie; GU, Yu; HU, Yuchen; DAI, Lirong. Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 38, n. 17, p. 19768–19776, 2024. DOI: 10.1609/aaai.v38i17.29951. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/29951. Acesso em: 7 may. 2026.