Cherian, Anoop, Chiori Hori, Tim K. Marks, and Jonathan Le Roux. “(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering”. Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 444–453. Accessed May 12, 2026. https://ojs.aaai.org/index.php/AAAI/article/view/19922.