Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis

Authors

  • Zhaoxin Fan Psyche AI Inc
  • Longbin Ji Xi'an Jiaotong Liverpool University
  • Pengxin Xu Psyche AI Inc
  • Fan Shen Psyche AI Inc
  • Kai Chen HKUST

DOI:

https://doi.org/10.1609/aaai.v38i2.27936

Keywords:

CV: Multi-modal Vision, CV: Motion & Tracking, CV: Representation Learning for Vision

Abstract

In the dynamic field of film and game development, the emergence of human motion synthesis methods has revolutionized avatar animation. Traditional methodologies, typically reliant on single modality inputs like text or audio, employ modality-specific model frameworks, posing challenges for unified model deployment and application. To address this, we propose Everything2Motion, a unified model framework. Everything2Motion consists of three key modules. The Input-Output Modality Modulation module tailors structures for specific multimodal inputs, eliminating the need for modality-specific frameworks. The Query-aware Autoencoder, based on the transformer encoder-decoder architecture, enables efficient latent motion generation. Lastly, the Prior Motion Distillation Decoder, a pretrained module, enhances the final skeleton sequence's naturalness and fluidity. Comprehensive experiments on several public datasets demonstrate the effectiveness of Everything2Motion, highlighting its potential for practical applications and setting a new benchmark in human motion synthesis.

Published

2024-03-24

How to Cite

Fan, Z., Ji, L., Xu, P., Shen, F., & Chen, K. (2024). Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1688-1697. https://doi.org/10.1609/aaai.v38i2.27936

Issue

Section

AAAI Technical Track on Computer Vision I