Geng, S., Gao, P., Chatterjee, M., Hori, C., Le Roux, J., Zhang, Y., Li, H., & Cherian, A. (2021). Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1415-1423. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/16231