(1)
Wang, Q.; Jiang, P.; Guo, Z.; Han, Y.; Zhao, Z. Multi-Speaker Video Dialog With Frame-Level Temporal Localization. AAAI 2020, 34, 12200-12207.