Wang, Q., Jiang, P., Guo, Z., Han, Y. and Zhao, Z. (2020) “Multi-Speaker Video Dialog with Frame-Level Temporal Localization”, Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), pp. 12200-12207. doi: 10.1609/aaai.v34i07.6901.