Weakly Supervised 3D Multi-Person Pose Estimation for Large-Scale Scenes Based on Monocular Camera and Single LiDAR

Authors

  • Peishan Cong ShanghaiTech University
  • Yiteng Xu ShanghaiTech University
  • Yiming Ren ShanghaiTech University
  • Juze Zhang ShanghaiTech University
  • Lan Xu ShanghaiTech University Shanghai Engineering Research Center of Intelligent Vision and Imaging
  • Jingya Wang ShanghaiTech University Shanghai Engineering Research Center of Intelligent Vision and Imaging
  • Jingyi Yu ShanghaiTech University Shanghai Engineering Research Center of Intelligent Vision and Imaging
  • Yuexin Ma ShanghaiTech University Shanghai Engineering Research Center of Intelligent Vision and Imaging

DOI:

https://doi.org/10.1609/aaai.v37i1.25120

Keywords:

CV: 3D Computer Vision, CV: Biometrics, Face, Gesture & Pose, CV: Multi-modal Vision

Abstract

Depth estimation is usually ill-posed and ambiguous for monocular camera-based 3D multi-person pose estimation. Since LiDAR can capture accurate depth information in long-range scenes, it can benefit both the global localization of individuals and the 3D pose estimation by providing rich geometry features. Motivated by this, we propose a monocular camera and single LiDAR-based method for 3D multi-person pose estimation in large-scale scenes, which is easy to deploy and insensitive to light. Specifically, we design an effective fusion strategy to take advantage of multi-modal input data, including images and point cloud, and make full use of temporal information to guide the network to learn natural and coherent human motions. Without relying on any 3D pose annotations, our method exploits the inherent geometry constraints of point cloud for self-supervision and utilizes 2D keypoints on images for weak supervision. Extensive experiments on public datasets and our newly collected dataset demonstrate the superiority and generalization capability of our proposed method. Project homepage is at \url{https://github.com/4DVLab/FusionPose.git}.

Downloads

Published

2023-06-26

How to Cite

Cong, P., Xu, Y., Ren, Y., Zhang, J., Xu, L., Wang, J., Yu, J., & Ma, Y. (2023). Weakly Supervised 3D Multi-Person Pose Estimation for Large-Scale Scenes Based on Monocular Camera and Single LiDAR. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 461-469. https://doi.org/10.1609/aaai.v37i1.25120

Issue

Section

AAAI Technical Track on Computer Vision I