Geng, Shijie, et al. “Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, May 2021, pp. 1415-23, doi:10.1609/aaai.v35i2.16231.