Nemani, Harsha, and Kiran Garimella. “Large-Scale Multimodal Content Analysis and Annotation With Vision-Language Models”. Proceedings of the International AAAI Conference on Web and Social Media 20, no. 1 (May 25, 2026): 1676–1699. Accessed May 27, 2026. https://ojs.aaai.org/index.php/ICWSM/article/view/42718.