VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation

Shi-Xue Zhang; Hongfa Wang; Duojun Huang; Xin Li; Xiaobin Zhu; Xu-Cheng Yin

doi:10.1609/aaai.v40i15.38269

Authors

Shi-Xue Zhang University of Science and Technology Beijing, China Tencent Technology (Shenzhen) Co. Ltd, China
Hongfa Wang Tsinghua Shenzhen International Graduate School, Tsinghua University, China Tencent Technology (Shenzhen) Co. Ltd, China
Duojun Huang Tencent Technology (Shenzhen) Co. Ltd, China
Xin Li Tencent Technology (Shenzhen) Co. Ltd, China
Xiaobin Zhu University of Science and Technology Beijing, China
Xu-Cheng Yin University of Science and Technology Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i15.38269

Abstract

Video captions play a crucial role in text-to-video generation tasks, as their quality directly influences the semantic coherence and visual fidelity of the generated videos. Although large vision-language models (VLMs) have demonstrated significant potential in caption generation, existing benchmarks inadequately address fine-grained evaluation, particularly in capturing spatial-temporal details critical for video generation. To address this gap, we introduce the Fine-grained Video Caption Evaluation Benchmark (VCapsBench), the first large-scale fine-grained benchmark comprising 5,677 (5K+) videos and 109,796 (100K+) question-answer pairs. These QA-pairs are systematically annotated across 21 fine-grained dimensions (e.g., camera movement, and shot type) that are empirically proven critical for text-to-video generation. We further introduce three metrics (Accuracy (AR), Inconsistency Rate (IR), Coverage Rate (CR)), and an automated evaluation pipeline leveraging a large language model (LLM) to verify caption quality via contrastive QA-pairs analysis. Our benchmark can advance the development of robust text-to-video models by providing actionable insights for caption optimization.

VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information