Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation

Authors

  • Chenxin Xu Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
  • Siheng Chen Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
  • Maosen Li Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
  • Ya Zhang Cooperative Medianet Innovation Center, Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v35i4.16409

Keywords:

Biometrics, Face, Gesture & Pose, Unsupervised & Self-Supervised Learning

Abstract

We propose a novel method based on teacher-student learning framework for 3D human pose estimation without any 3D annotation or side information. To solve this unsupervised-learning problem, the teacher network adopts pose-dictionary-based modeling for regularization to estimate a physically plausible 3D pose. To handle the decomposition ambiguity in the teacher network, we propose a cycle-consistent architecture promoting a 3D rotation-invariant property to train the teacher network. To further improve the estimation accuracy, the student network adopts a novel graph convolution network for flexibility to directly estimate the 3D coordinates. Another cycle-consistent architecture promoting 3D rotation-equivariant property is adopted to exploit geometry consistency, together with knowledge distillation from the teacher network to improve the pose estimation performance. We conduct extensive experiments on Human3.6M and MPI-INF-3DHP. Our method reduces the 3D joint prediction error by 11.4% compared to state-of-the-art unsupervised methods and also outperforms many weakly-supervised methods that use side information on Human3.6M. Code will be available at https://github.com/sjtuxcx/ITES.

Downloads

Published

2021-05-18

How to Cite

Xu, C., Chen, S., Li, M., & Zhang, Y. (2021). Invariant Teacher and Equivariant Student for Unsupervised 3D Human Pose Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 35(4), 3013-3021. https://doi.org/10.1609/aaai.v35i4.16409

Issue

Section

AAAI Technical Track on Computer Vision III