LI, Y.; LIU, H.; TANG, H. Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 36, n. 2, p. 1456-1463, 2022. DOI: 10.1609/aaai.v36i2.20035. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/20035. Acesso em: 24 apr. 2024.