Fang, X., Fang, W., Wang, C., Qu, X., & Liu, D. (2026). Rethinking Video-Language Model from the Language Input Perspective. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3885–3893. https://doi.org/10.1609/aaai.v40i5.37390