Neural Marionette: Unsupervised Learning of Motion Skeleton and Latent Dynamics from Volumetric Video

Authors

  • Jinseok Bae Seoul National University
  • Hojun Jang Seoul National University
  • Cheol-Hui Min Seoul National University
  • Hyungun Choi Seoul National University
  • Young Min Kim Seoul National University

DOI:

https://doi.org/10.1609/aaai.v36i1.19882

Keywords:

Computer Vision (CV)

Abstract

We present Neural Marionette, an unsupervised approach that discovers the skeletal structure from a dynamic sequence and learns to generate diverse motions that are consistent with the observed motion dynamics. Given a video stream of point cloud observation of an articulated body under arbitrary motion, our approach discovers the unknown low-dimensional skeletal relationship that can effectively represent the movement. Then the discovered structure is utilized to encode the motion priors of dynamic sequences in a latent structure, which can be decoded to the relative joint rotations to represent the full skeletal motion. Our approach works without any prior knowledge of the underlying motion or skeletal structure, and we demonstrate that the discovered structure is even comparable to the hand-labeled ground truth skeleton in representing a 4D sequence of motion. The skeletal structure embeds the general semantics of possible motion space that can generate motions for diverse scenarios. We verify that the learned motion prior is generalizable to the multi-modal sequence generation, interpolation of two poses, and motion retargeting to a different skeletal structure.

Downloads

Published

2022-06-28

How to Cite

Bae, J., Jang, H., Min, C.-H., Choi, H., & Kim, Y. M. (2022). Neural Marionette: Unsupervised Learning of Motion Skeleton and Latent Dynamics from Volumetric Video. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1), 86-94. https://doi.org/10.1609/aaai.v36i1.19882

Issue

Section

AAAI Technical Track on Computer Vision I