Scaling-up Perceptual Video Quality Assessment

Authors

  • Ziheng Jia Shanghai Jiaotong University
  • Zicheng Zhang Shanghai Artificial Intelligence Laboratory
  • Xiaorong Zhu Shanghai Jiaotong University
  • Chunyi Li Shanghai Jiaotong University
  • Jinliang Han Shanghai Jiaotong University
  • Xiaohong Liu Shanghai Jiaotong University
  • Guangtao Zhai Shanghai Jiaotong University, Shanghai Artificial Intelligence Laboratory
  • Xiongkuo Min Shanghai Jiaotong University

DOI:

https://doi.org/10.1609/aaai.v40i27.39386

Abstract

The data scaling law has significantly enhanced large multi-modal models (LMMs) performance across various downstream tasks. However, in the domain of perceptual video quality assessment (VQA), the potential of data scaling remains unprecedented due to the scarcity of labeled resources and the insufficient scale of datasets. To address this, we propose OmniVQA, a framework designed to efficiently build high-quality, machine-dominated synthetic multi-modal instruction databases (MIDBs) for VQA. We then scale up to create OmniVQA-Chat-400K, the largest dataset in the VQA field concurrently. Our focus is on the technical and aesthetic quality dimensions, with abundant in-context instruction data to provide fine-grained VQA knowledge. Additionally, we build the OmniVQA-MOS-20K dataset to enhance the model's quantitative quality rating capabilities. We then introduce a complementary training strategy that effectively leverages the knowledge from datasets for different tasks. Furthermore, we propose the OmniVQA-FG (fine-grain)-Benchmark to evaluate the fine-grained performance of models. Our results demonstrate that our models achieve state-of-the-art performance in both tasks.

Published

2026-03-14

How to Cite

Jia, Z., Zhang, Z., Zhu, X., Li, C., Han, J., Liu, X., Zhai, G., & Min, X. (2026). Scaling-up Perceptual Video Quality Assessment. Proceedings of the AAAI Conference on Artificial Intelligence, 40(27), 22292-22300. https://doi.org/10.1609/aaai.v40i27.39386

Issue

Section

AAAI Technical Track on Machine Learning IV