CoMA: Compositional Human Motion Generation with Multi-modal Agents

Authors

  • Shanlin Sun University of California, Irvine
  • Jiaqi Xu University of California, San Diego
  • Gabriel de Araujo University of California, Irvine
  • Shenghan Zhou Chongqing University
  • Hanwen Zhang Huazhong University of Science and Technology
  • Ziheng Huang Columbia University
  • Chenyu You State University of New York at Stony Brook
  • Xiaohui Xie University of California, Irvine

DOI:

https://doi.org/10.1609/aaai.v40i11.37878

Abstract

3D human motion generation has seen substantial advancement in recent years. While state-of-the-art approaches have improved performance significantly, they still struggle with complex and detailed motions unseen in training data, largely due to the scarcity of motion datasets and the prohibitive cost of generating new training examples. To address these challenges, we introduce CoMA, an agent-based solution for complex human motion generation, editing, and comprehension. CoMA leverages multiple collaborative agents powered by large language and vision models, alongside a mask transformer-based motion generator featuring body part-specific encoders and codebooks for fine-grained control. Our framework enables generation of both short and long motion sequences with detailed instructions, text-guided motion editing, and self-correction for improved quality. Evaluations on the HumanML3D dataset demonstrate competitive performance against state-of-the-art methods. Additionally, we create a set of context-rich, compositional, and long text prompts, where user studies show our method significantly outperforms existing approaches.

Downloads

Published

2026-03-14

How to Cite

Sun, S., Xu, J., de Araujo, G., Zhou, S., Zhang, H., Huang, Z., … Xie, X. (2026). CoMA: Compositional Human Motion Generation with Multi-modal Agents. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 9206–9214. https://doi.org/10.1609/aaai.v40i11.37878

Issue

Section

AAAI Technical Track on Computer Vision VIII