Nemani, H., & Garimella, K. (2026). Large-Scale Multimodal Content Analysis and Annotation with Vision-Language Models. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 1676–1699. https://doi.org/10.1609/icwsm.v20i1.42718