Peng, M., Wang, C., Shi, Y. and Zhou, X.-D. (2023) “Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer”, Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), pp. 2038-2046. doi: 10.1609/aaai.v37i2.25296.