Return to Article Details Large-Scale Multimodal Content Analysis and Annotation with Vision-Language Models Download Download PDF