HybridCap: Inertia-Aid Monocular Capture of Challenging Human Motions

Authors

  • Han Liang School of Information Science and Technology, ShanghaiTech University
  • Yannan He School of Information Science and Technology, ShanghaiTech University
  • Chengfeng Zhao School of Information Science and Technology, ShanghaiTech University
  • Mutian Li School of Information Science and Technology, ShanghaiTech University
  • Jingya Wang School of Information Science and Technology, ShanghaiTech University Shanghai Frontiers Science Center of Human-centered Artificial Intelligence
  • Jingyi Yu School of Information Science and Technology, ShanghaiTech University Shanghai Frontiers Science Center of Human-centered Artificial Intelligence
  • Lan Xu School of Information Science and Technology, ShanghaiTech University Shanghai Frontiers Science Center of Human-centered Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v37i2.25240

Keywords:

CV: Biometrics, Face, Gesture & Pose, CV: Computational Photography, Image & Video Synthesis, CV: Motion & Tracking, CV: Multi-modal Vision

Abstract

Monocular 3D motion capture (mocap) is beneficial to many applications. The use of a single camera, however, often fails to handle occlusions of different body parts and hence it is limited to capture relatively simple movements. We present a light-weight, hybrid mocap technique called HybridCap that augments the camera with only 4 Inertial Measurement Units (IMUs) in a novel learning-and-optimization framework. We first employ a weakly-supervised and hierarchical motion inference module based on cooperative pure residual recurrent blocks that serve as limb, body and root trackers as well as an inverse kinematics solver. Our network effectively narrows the search space of plausible motions via coarse-to-fine pose estimation and manages to tackle challenging movements with high efficiency. We further develop a hybrid optimization scheme that combines inertial feedback and visual cues to improve tracking accuracy. Extensive experiments on various datasets demonstrate HybridCap can robustly handle challenging movements ranging from fitness actions to Latin dance. It also achieves real-time performance up to 60 fps with state-of-the-art accuracy.

Downloads

Published

2023-06-26

How to Cite

Liang, H., He, Y., Zhao, C., Li, M., Wang, J., Yu, J., & Xu, L. (2023). HybridCap: Inertia-Aid Monocular Capture of Challenging Human Motions. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 1539-1548. https://doi.org/10.1609/aaai.v37i2.25240

Issue

Section

AAAI Technical Track on Computer Vision II