[1]

H. Nemani and K. Garimella, “Large-Scale Multimodal Content Analysis and Annotation with Vision-Language Models”, ICWSM, vol. 20, no. 1, pp. 1676–1699, May 2026.