NEMANI, Harsha; GARIMELLA, Kiran. Large-Scale Multimodal Content Analysis and Annotation with Vision-Language Models. Proceedings of the International AAAI Conference on Web and Social Media, [S. l.], v. 20, n. 1, p. 1676–1699, 2026. DOI: 10.1609/icwsm.v20i1.42718. Disponível em: https://ojs.aaai.org/index.php/ICWSM/article/view/42718. Acesso em: 27 may. 2026.