MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Tuna Han Salih Meral; Hidir Yesiltepe; Connor Dunlop; Pinar Yanardag

doi:10.1609/aaai.v40i10.37750

Authors

Tuna Han Salih Meral Virginia Polytechnic Institute and State University
Hidir Yesiltepe Virginia Polytechnic Institute and State University
Connor Dunlop Virginia Polytechnic Institute and State University
Pinar Yanardag Virginia Polytechnic Institute and State University

DOI:

https://doi.org/10.1609/aaai.v40i10.37750

Abstract

Text-to-video models have demonstrated impressive capabilities in producing diverse video content, yet often lack fine-grained control over motion. We address the problem of motion transfer: given a source video and a target text prompt, generate a new video that preserves the source motion while matching the target semantics and allowing large changes in appearance and scene layout. We introduce MotionFlow, a training-free framework that performs test-time latent optimization guided by attention-derived motion cues. MotionFlow first extracts cross-attention maps from a pre-trained video diffusion model and converts them into spatio-temporal motion masks for the source subject. During generation, it optimizes the target latents so that their evolving attention patterns align with these masks, while the target text controls appearance. This avoids direct attention-map replacement and any model-specific fine-tuning, reducing artifacts and improving flexibility. Qualitative and quantitative experiments, including a user study, show that MotionFlow outperforms existing methods in motion fidelity, temporal consistency, and versatility, even under drastic scene changes.

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information