Nemani, Harsha, and Kiran Garimella. “Large-Scale Multimodal Content Analysis and Annotation With Vision-Language Models”. Proceedings of the International AAAI Conference on Web and Social Media, vol. 20, no. 1, May 2026, pp. 1676-99, doi:10.1609/icwsm.v20i1.42718.