(1)
Li, Y.; Liu, H.; Tang, H. Multi-Modal Perception Attention Network With Self-Supervised Learning for Audio-Visual Speaker Tracking. AAAI 2022, 36, 1456-1463.