Peng, Min, Chongyang Wang, Yu Shi, and Xiang-Dong Zhou. 2023. “Efficient End-to-End Video Question Answering With Pyramidal Multimodal Transformer”. Proceedings of the AAAI Conference on Artificial Intelligence 37 (2):2038-46. https://doi.org/10.1609/aaai.v37i2.25296.