Motion-Zero: A Zero-Shot Trajectory Control Framework of Moving Object for Diffusion-Based Video Generation
DOI:
https://doi.org/10.1609/aaai.v39i2.32198Abstract
Recent large-scale pre-trained diffusion models have demonstrated a powerful generative ability to produce high-quality videos from detailed text descriptions. However, exerting control over the motion of objects in videos generated by any video diffusion model remains a challenging problem. In this paper, we propose a novel zero-shot moving object trajectory control framework, Motion-Zero, to enable arbitrary single-object-trajectory control for the text-to-video diffusion model. To this end, an initial noise prior module is designed to provide a position-based prior to improve the stability of the appearance of the moving object and the accuracy of position. In addition, based on the attention map of the U-Net, spatial constraints are directly applied to the denoising process of diffusion models, which further ensures the positional consistency of moving objects during the inference. Furthermore, temporal consistency is guaranteed with a proposed shift temporal attention mechanism. Our method can be flexibly applied to various state-of-the-art video diffusion models without any training process. Extensive experiments demonstrate our proposed method can control the motion trajectories of arbitrary objects while preserving the original ability to generate high-quality videos.Downloads
Published
2025-04-11
How to Cite
Chen, C., Shu, J., He, G., Wang, C., & Li, Y. (2025). Motion-Zero: A Zero-Shot Trajectory Control Framework of Moving Object for Diffusion-Based Video Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(2), 2016–2024. https://doi.org/10.1609/aaai.v39i2.32198
Issue
Section
AAAI Technical Track on Computer Vision I