Geng S, Gao P, Chatterjee M, Hori C, Le Roux J, Zhang Y, Li H, Cherian A. Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers. AAAI [Internet]. 2021May18 [cited 2024Apr.19];35(2):1415-23. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/16231