EditBoard: Towards a Comprehensive Evaluation Benchmark for Text-Based Video Editing Models

Yupeng Chen; Penglin Chen; Xiaoyu Zhang; Yixian Huang; Qian Xie

doi:10.1609/aaai.v39i15.33754

Authors

Yupeng Chen The Chinese University of Hong Kong, Shenzhen
Penglin Chen Nanjing University
Xiaoyu Zhang The Chinese University of Hong Kong, Shenzhen
Yixian Huang The Chinese University of Hong Kong, Shenzhen
Qian Xie University of Leeds

DOI:

https://doi.org/10.1609/aaai.v39i15.33754

Abstract

The rapid development of diffusion models has significantly advanced AI-generated content (AIGC), particularly in Text-to-Image (T2I) and Text-to-Video (T2V) generation. Text-based video editing, leveraging these generative capabilities, has emerged as a promising field, enabling precise modifications to videos based on text prompts. Despite the proliferation of innovative video editing models, there is a conspicuous lack of comprehensive evaluation benchmarks that holistically assess these models’ performance across various dimensions. Existing evaluations are limited and inconsistent, typically summarizing overall performance with a single score, which obscures models’ effectiveness on individual editing tasks. To address this gap, we propose EditBoard, the first comprehensive evaluation benchmark for text-based video editing models. EditBoard encompasses nine automatic metrics across four dimensions, evaluating models on four task categories and introducing three new metrics to assess fidelity. This task-oriented benchmark facilitates objective evaluation by detailing model performance and providing insights into each model’s strengths and weaknesses. By open-sourcing EditBoard, we aim to standardize evaluation and advance the development of robust video editing models.

EditBoard: Towards a Comprehensive Evaluation Benchmark for Text-Based Video Editing Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information