AMD: Autoregressive Motion Diffusion

Authors

  • Bo Han Zhejiang University
  • Hao Peng Unity China
  • Minjing Dong University of Sydney
  • Yi Ren Zhejiang Univerisity
  • Yixuan Shen National University of Singapore
  • Chang Xu University of Sydney

DOI:

https://doi.org/10.1609/aaai.v38i3.27973

Keywords:

CV: Multi-modal Vision, CV: 3D Computer Vision, CV: Applications, CV: Motion & Tracking, HAI: Applications, HAI: Game Design -- Virtual Humans, NPCs and Autonomous Characters, HAI: User Experience and Usability

Abstract

Human motion generation aims to produce plausible human motion sequences according to various conditional inputs, such as text or audio. Despite the feasibility of existing methods in generating motion based on short prompts and simple motion patterns, they encounter difficulties when dealing with long prompts or complex motions. The challenges are two-fold: 1) the scarcity of human motion-captured data for long prompts and complex motions. 2) the high diversity of human motions in the temporal domain and the substantial divergence of distributions from conditional modalities, leading to a many-to-many mapping problem when generating motion with complex and long texts. In this work, we address these gaps by 1) elaborating the first dataset pairing long textual descriptions and 3D complex motions (HumanLong3D), and 2) proposing an autoregressive motion diffusion model (AMD). Specifically, AMD integrates the text prompt at the current timestep with the text prompt and action sequences at the previous timestep as conditional information to predict the current action sequences in an iterative manner. Furthermore, we present its generalization for X-to-Motion with “No Modality Left Behind”, enabling for the first time the generation of high-definition and high-fidelity human motions based on user-defined modality input.

Published

2024-03-24

How to Cite

Han, B., Peng, H., Dong, M., Ren, Y., Shen, Y., & Xu, C. (2024). AMD: Autoregressive Motion Diffusion. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2022-2030. https://doi.org/10.1609/aaai.v38i3.27973

Issue

Section

AAAI Technical Track on Computer Vision II