[1]
Peng, M., Wang, C., Shi, Y. and Zhou, X.-D. 2023. Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer. Proceedings of the AAAI Conference on Artificial Intelligence. 37, 2 (Jun. 2023), 2038-2046. DOI:https://doi.org/10.1609/aaai.v37i2.25296.