Geng S, Gao P, Chatterjee M, Hori C, Le Roux J, Zhang Y, et al. Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers. AAAI [Internet]. 2021 May 18 [cited 2026 Jul. 21];35(2):1415-23. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/16231