Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild

Authors

  • Jiayi Chen Peking University Beijing Institute for General AI
  • Mi Yan Peking University
  • Jiazhao Zhang Peking University
  • Yinzhen Xu Peking University Beijing Institute for General AI
  • Xiaolong Li Virginia Tech
  • Yijia Weng Stanford University
  • Li Yi Tsinghua University
  • Shuran Song Columbia University
  • He Wang Peking University

DOI:

https://doi.org/10.1609/aaai.v37i1.25103

Keywords:

CV: 3D Computer Vision, CV: Biometrics, Face, Gesture & Pose, CV: Motion & Tracking

Abstract

In this work, we tackle the challenging task of jointly tracking hand object poses and reconstructing their shapes from depth point cloud sequences in the wild, given the initial poses at frame 0. We for the first time propose a point cloud-based hand joint tracking network, HandTrackNet, to estimate the inter-frame hand joint motion. Our HandTrackNet proposes a novel hand pose canonicalization module to ease the tracking task, yielding accurate and robust hand joint tracking. Our pipeline then reconstructs the full hand via converting the predicted hand joints into a MANO hand. For object tracking, we devise a simple yet effective module that estimates the object SDF from the first frame and performs optimization-based tracking. Finally, a joint optimization step is adopted to perform joint hand and object reasoning, which alleviates the occlusion-induced ambiguity and further refines the hand pose. During training, the whole pipeline only sees purely synthetic data, which are synthesized with sufficient variations and by depth simulation for the ease of generalization. The whole pipeline is pertinent to the generalization gaps and thus directly transferable to real in-the-wild data. We evaluate our method on two real hand object interaction datasets, e.g. HO3D and DexYCB, without any fine-tuning. Our experiments demonstrate that the proposed method significantly outperforms the previous state-of-the-art depth-based hand and object pose estimation and tracking methods, running at a frame rate of 9 FPS. We have released our code on https://github.com/PKU-EPIC/HOTrack.

Downloads

Published

2023-06-26

How to Cite

Chen, J., Yan, M., Zhang, J., Xu, Y., Li, X., Weng, Y., Yi, L., Song, S., & Wang, H. (2023). Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild. Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 304-312. https://doi.org/10.1609/aaai.v37i1.25103

Issue

Section

AAAI Technical Track on Computer Vision I