AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion

Authors

  • Beibei Jing Huazhong University of Science and Technology
  • Youjia Zhang Huazhong University of Science and Technology
  • Zikai Song Huazhong University of Science and Technology
  • Junqing Yu Huazhong University of Science and Technology
  • Wei Yang Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v38i3.28042

Keywords:

CV: Motion & Tracking, CV: Multi-modal Vision

Abstract

Generating realistic human motion sequences from text descriptions is a challenging task that requires capturing the rich expressiveness of both natural language and human motion. Recent advances in diffusion models have enabled significant progress in human motion synthesis. However, existing methods struggle to handle text inputs that describe complex or long motions. In this paper, we propose the Adaptable Motion Diffusion (AMD) model, which leverages a Large Language Model (LLM) to parse the input text into a sequence of concise and interpretable anatomical scripts that correspond to the target motion. This process exploits the LLM’s ability to provide anatomical guidance for complex motion synthesis. We then devise a two-branch fusion scheme that balances the influence of the input text and the anatomical scripts on the inverse diffusion process, which adaptively ensures the semantic fidelity and diversity of the synthesized motion. Our method can effectively handle texts with complex or long motion descriptions, where existing methods often fail. Experiments on datasets with relatively more complex motions, such as CLCD1 and CLCD2, demonstrate that our AMD significantly outperforms existing state-of-the-art models.

Published

2024-03-24

How to Cite

Jing, B., Zhang, Y., Song, Z., Yu, J., & Yang, W. (2024). AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2643-2651. https://doi.org/10.1609/aaai.v38i3.28042

Issue

Section

AAAI Technical Track on Computer Vision II