Wang Q, Jiang P, Guo Z, Han Y, Zhao Z. Multi-Speaker Video Dialog with Frame-Level Temporal Localization. AAAI [Internet]. 2020 Apr. 3 [cited 2026 May 25];34(07):12200-7. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/6901