Geng, S., P. Gao, M. Chatterjee, C. Hori, J. Le Roux, Y. Zhang, H. Li, and A. Cherian. “Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, May 2021, pp. 1415-23, https://ojs.aaai.org/index.php/AAAI/article/view/16231.