Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis
DOI:
https://doi.org/10.1609/aaai.v38i2.27936Keywords:
CV: Multi-modal Vision, CV: Motion & Tracking, CV: Representation Learning for VisionAbstract
In the dynamic field of film and game development, the emergence of human motion synthesis methods has revolutionized avatar animation. Traditional methodologies, typically reliant on single modality inputs like text or audio, employ modality-specific model frameworks, posing challenges for unified model deployment and application. To address this, we propose Everything2Motion, a unified model framework. Everything2Motion consists of three key modules. The Input-Output Modality Modulation module tailors structures for specific multimodal inputs, eliminating the need for modality-specific frameworks. The Query-aware Autoencoder, based on the transformer encoder-decoder architecture, enables efficient latent motion generation. Lastly, the Prior Motion Distillation Decoder, a pretrained module, enhances the final skeleton sequence's naturalness and fluidity. Comprehensive experiments on several public datasets demonstrate the effectiveness of Everything2Motion, highlighting its potential for practical applications and setting a new benchmark in human motion synthesis.Downloads
Published
2024-03-24
How to Cite
Fan, Z., Ji, L., Xu, P., Shen, F., & Chen, K. (2024). Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1688-1697. https://doi.org/10.1609/aaai.v38i2.27936
Issue
Section
AAAI Technical Track on Computer Vision I