(1)
Peng, M.; Wang, C.; Shi, Y.; Zhou, X.-D. Efficient End-to-End Video Question Answering With Pyramidal Multimodal Transformer. AAAI 2023, 37, 2038-2046.