Geng, S., P. Gao, M. Chatterjee, C. Hori, J. Le Roux, Y. Zhang, H. Li, and A. Cherian. “Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, May 2021, pp. 1415-23, doi:10.1609/aaai.v35i2.16231.