Geng, S., Gao, P., Chatterjee, M., Hori, C., Le Roux, J., Zhang, Y., Li, H., & Cherian, A. (2021). Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 35(2), 1415-1423. https://doi.org/10.1609/aaai.v35i2.16231