Peng, M., C. Wang, Y. Shi, and X.-D. Zhou. “Efficient End-to-End Video Question Answering With Pyramidal Multimodal Transformer”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, June 2023, pp. 2038-46, doi:10.1609/aaai.v37i2.25296.