Fang, Xiang, et al. “Rethinking Video-Language Model from the Language Input Perspective”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 5, Mar. 2026, pp. 3885-93, doi:10.1609/aaai.v40i5.37390.