MoCaNet: Motion Retargeting In-the-Wild via Canonicalization Networks

Authors

  • Wentao Zhu School of Computer Science, Peking University Shanghai AI Laboratory
  • Zhuoqian Yang SenseTime Research
  • Ziang Di Southeast University
  • Wayne Wu Shanghai AI Laboratory SenseTime Research
  • Yizhou Wang School of Computer Science, Peking University
  • Chen Change Loy S-Lab, Nanyang Technological University

DOI:

https://doi.org/10.1609/aaai.v36i3.20274

Keywords:

Computer Vision (CV), Domain(s) Of Application (APP), Humans And AI (HAI)

Abstract

We present a novel framework that brings the 3D motion retargeting task from controlled environments to in-the-wild scenarios. In particular, our method is capable of retargeting body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure. It is designed to leverage massive online videos for unsupervised training, needless of 3D annotations or motion-body pairing information. The proposed method is built upon two novel canonicalization operations, structure canonicalization and view canonicalization. Trained with the canonicalization operations and the derived regularizations, our method learns to factorize a skeleton sequence into three independent semantic subspaces, i.e., motion, structure, and view angle. The disentangled representation enables motion retargeting from 2D to 3D with high precision. Our method achieves superior performance on motion transfer benchmarks with large body variations and challenging actions. Notably, the canonicalized skeleton sequence could serve as a disentangled and interpretable representation of human motion that benefits action analysis and motion retrieval.

Downloads

Published

2022-06-28

How to Cite

Zhu, W., Yang, Z., Di, Z., Wu, W., Wang, Y., & Loy, C. C. (2022). MoCaNet: Motion Retargeting In-the-Wild via Canonicalization Networks. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 3617-3625. https://doi.org/10.1609/aaai.v36i3.20274

Issue

Section

AAAI Technical Track on Computer Vision III