Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis

Zhaoxin Fan; Longbin Ji; Pengxin Xu; Fan Shen; Kai Chen

doi:10.1609/aaai.v38i2.27936

Authors

Zhaoxin Fan Psyche AI Inc
Longbin Ji Xi'an Jiaotong Liverpool University
Pengxin Xu Psyche AI Inc
Fan Shen Psyche AI Inc
Kai Chen HKUST

DOI:

https://doi.org/10.1609/aaai.v38i2.27936

Keywords:

CV: Multi-modal Vision, CV: Motion & Tracking, CV: Representation Learning for Vision

Abstract

In the dynamic field of film and game development, the emergence of human motion synthesis methods has revolutionized avatar animation. Traditional methodologies, typically reliant on single modality inputs like text or audio, employ modality-specific model frameworks, posing challenges for unified model deployment and application. To address this, we propose Everything2Motion, a unified model framework. Everything2Motion consists of three key modules. The Input-Output Modality Modulation module tailors structures for specific multimodal inputs, eliminating the need for modality-specific frameworks. The Query-aware Autoencoder, based on the transformer encoder-decoder architecture, enables efficient latent motion generation. Lastly, the Prior Motion Distillation Decoder, a pretrained module, enhances the final skeleton sequence's naturalness and fluidity. Comprehensive experiments on several public datasets demonstrate the effectiveness of Everything2Motion, highlighting its potential for practical applications and setting a new benchmark in human motion synthesis.

Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription