MotivDance: Fine-Grained Text-Guided Motivation Choreography with Music Synchronization

Authors

  • Chenguang Li State Key Laboratory of Advanced Rail Autonomous Operation, Beijing, China School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing, China
  • Yu-Hui Wen School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing, China
  • Liping Jing State Key Laboratory of Advanced Rail Autonomous Operation, Beijing, China School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i8.37528

Abstract

Realistic choreography demands simultaneous attention to rhythm and motivation. Prevailing automated dance generation methods mainly depend on musical input, overlooking the motivations that drive meaningful dance creation. Inspired by the motivation choreography, we aim to articulate dance motivations through textual guidance. However, the absence of high-quality datasets concurrently containing music, textual descriptions, and motion data presents a challenge in achieving accurate fine-grained textual control. To address this limitation, we present MotivDance, a novel framework integrating fine-grained textual guidance with music to synthesize semantically coherent dance sequences. Our approach first synthesizes text-guided key poses as motivations. We then introduce an Adaptive Keyframe Locator that dynamically positions these motivations within the musical context through beat-aware synchronization and cross-modal latent space alignment. Finally, a Transformer-based U-Net diffusion model performs the motion in-betweening while preserving motivational integrity. Extensive qualitative and quantitative experiments demonstrate that MotivDance effectively integrates music with fine-grained text control to generate high-fidelity dance motions.

Published

2026-03-14

How to Cite

Li, C., Wen, Y.-H., & Jing, L. (2026). MotivDance: Fine-Grained Text-Guided Motivation Choreography with Music Synchronization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 6046–6054. https://doi.org/10.1609/aaai.v40i8.37528

Issue

Section

AAAI Technical Track on Computer Vision V