OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding

Dianbing Xi; Jiepeng Wang; Yuanzhi Liang; Xi Qiu; Yuchi Huo; Rui Wang; Chi Zhang; Xuelong Li

doi:10.1609/aaai.v40i13.38068

Authors

Dianbing Xi State Key Laboratory of CAD&CG, Zhejiang University Institute of Artificial Intelligence, China Telecom
Jiepeng Wang Institute of Artificial Intelligence, China Telecom
Yuanzhi Liang Institute of Artificial Intelligence, China Telecom
Xi Qiu Institute of Artificial Intelligence, China Telecom
Yuchi Huo State Key Laboratory of CAD&CG, Zhejiang University
Rui Wang State Key Laboratory of CAD&CG, Zhejiang University
Chi Zhang Institute of Artificial Intelligence, China Telecom
Xuelong Li Institute of Artificial Intelligence, China Telecom

DOI:

https://doi.org/10.1609/aaai.v40i13.38068

Abstract

In this paper, we propose a novel framework for controllable video diffusion, OmniVDiff , aiming to synthesize and comprehend multiple video visual content in a single diffusion model. To achieve this, OmniVDiff treats all video visual modalities in the color space to learn a joint distribution, while employing an adaptive control strategy that dynamically adjusts the role of each visual modality during the diffusion process, either as a generation modality or a conditioning modality. Our framework supports three key capabilities: (1) Text-conditioned video generation, where all modalities are jointly synthesized from a textual prompt; (2) Video understanding, where structural modalities are predicted from rgb inputs in a coherent manner; and (3) X-conditioned video generation, where video synthesis is guided by finegrained inputs such as depth, canny and segmentation. Extensive experiments demonstrate that OmniVDiff achieves state-of-the-art performance in video generation tasks and competitive results in video understanding. Its flexibility and scalability make it well-suited for downstream applications such as video-to-video translation, modality adaptation for visual tasks, and scene reconstruction.

OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information