Progressive Multi-View Human Mesh Recovery with Self-Supervision


  • Xuan Gong University at Buffalo United Imaging Intelligence
  • Liangchen Song University at Buffalo United Imaging Intelligence
  • Meng Zheng United Imaging Intelligence
  • Benjamin Planche United Imaging Intelligence
  • Terrence Chen United Imaging Intelligence
  • Junsong Yuan University at Buffalo
  • David Doermann University at Buffalo
  • Ziyan Wu United Imaging Intelligence



CV: 3D Computer Vision, CV: Applications, CV: Biometrics, Face, Gesture & Pose, ML: Unsupervised & Self-Supervised Learning


To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e.g., motion capture, sport analysis) and robustness to single-view ambiguities. Existing solutions typically suffer from poor generalization performance to new settings, largely due to the limited diversity of image/3D-mesh pairs in multi-view training data. To address this shortcoming, people have explored the use of synthetic images. But besides the usual impact of visual gap between rendered and target data, synthetic-data-driven multi-view estimators also suffer from overfitting to the camera viewpoint distribution sampled during training which usually differs from real-world distributions. Tackling both challenges, we propose a novel simulation-based training pipeline for multi-view human mesh recovery, which (a) relies on intermediate 2D representations which are more robust to synthetic-to-real domain gap; (b) leverages learnable calibration and triangulation to adapt to more diversified camera setups; and (c) progressively aggregates multi-view information in a canonical 3D space to remove ambiguities in 2D representations. Through extensive benchmarking, we demonstrate the superiority of the proposed solution especially for unseen in-the-wild scenarios.




How to Cite

Gong, X., Song, L., Zheng, M., Planche, B., Chen, T., Yuan, J., Doermann, D., & Wu, Z. (2023). Progressive Multi-View Human Mesh Recovery with Self-Supervision. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 676-684.



AAAI Technical Track on Computer Vision I