UniScene-MoTion: Unified Scene & Motion-aware Diffusion Transition Framework

Rui Jiang; Chongmian Wang; Xinghe Fu; Yehao Lu; Teng Li; Xi Li

doi:10.1609/aaai.v40i7.37458

Authors

Rui Jiang Zhejiang University
Chongmian Wang Zhejiang University
Xinghe Fu Zhejiang University
Yehao Lu Zhejiang University
Teng Li Zhejiang University
Xi Li Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i7.37458

Abstract

Video transitions are critical for ensuring temporal coherence in edited media, yet existing methods often rely on handcrafted effects or relative-scale trajectories that fail to capture the physical structure of real-world scenes. In this work, we introduce a scale-aware video transition framework that explicitly incorporates depth-aware 3D reasoning into a diffusion-based generation pipeline. Built upon a powerful I2V foundation, our method leverages single-image depth prediction to align camera motion with metric-scale geometry, enabling physically consistent transitions. To reduce reliance on precise camera inputs, we propose a bidirectional conditional control module and a progressive training strategy with conditional dropout, enhancing generalization to loosely specified or missing camera trajectories. Extensive experiments demonstrate that our approach achieves state-of-the-art performance, delivering realistic, geometrically coherent transitions across diverse scenes and applications with minimal input guidance.

UniScene-MoTion: Unified Scene & Motion-aware Diffusion Transition Framework

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information