VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment

Authors

  • Shangkun Sun Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen 518055, China, Peng Cheng Laboratory
  • Xiaoyu Liang Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen 518055, China,
  • Songlin Fan Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen 518055, China, Peng Cheng Laboratory
  • Wenxu Gao Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen 518055, China, Peng Cheng Laboratory
  • Wei Gao Guangdong Provincial Key Laboratory of Ultra High Definition Immersive Media Technology, School of Electronic and Computer Engineering, Shenzhen Graduate School, Peking University, Shenzhen 518055, China, Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v39i7.32763

Abstract

Text-driven video editing has recently experienced rapid development. Despite this, evaluating edited videos remains a considerable challenge. Current metrics tend to fail to align with human perceptions, and effective quantitative metrics for video editing are still notably absent. To address this, we introduce VE-Bench, a benchmark suite tailored to the assessment of text-driven video editing. This suite includes VE-Bench DB, a video quality assessment (VQA) database for video editing. VE-Bench DB encompasses a diverse set of source videos featuring various motions and subjects, along with multiple distinct editing prompts, editing results from 8 different models, and the corresponding Mean Opinion Scores (MOS) from 24 human annotators. Based on VE-Bench DB, we further propose VE-Bench QA, a quantitative human-aligned measurement for the text-driven video editing task. In addition to the aesthetic, distortion, and other visual quality indicators that traditional VQA methods emphasize, VE-Bench QA focuses on the text-video alignment and the relevance modeling between source and edited videos. It introduces a new assessment network for video editing that attains superior performance in alignment with human preferences.To the best of our knowledge, VE-Bench introduces the first quality assessment dataset for video editing and proposes an effective subjective-aligned quantitative metric for this domain. All models, data, and code will be publicly available to the community.

Downloads

Published

2025-04-11

How to Cite

Sun, S., Liang, X., Fan, S., Gao, W., & Gao, W. (2025). VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7105–7113. https://doi.org/10.1609/aaai.v39i7.32763

Issue

Section

AAAI Technical Track on Computer Vision VI