Hand-Centric Motion Refinement for 3D Hand-Object Interaction via Hierarchical Spatial-Temporal Modeling

Authors

  • Yuze Hao Zhejiang University Beijing University of Posts and Telecommunications
  • Jianrong Zhang Zhejiang University
  • Tao Zhuo Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences)
  • Fuan Wen Beijing University of Posts and Telecommunications Beijing Key Laboratory of Network System and Network Culture
  • Hehe Fan Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v38i3.27979

Keywords:

CV: Biometrics, Face, Gesture & Pose, CV: 3D Computer Vision, CV: Motion & Tracking

Abstract

Hands are the main medium when people interact with the world. Generating proper 3D motion for hand-object interaction is vital for applications such as virtual reality and robotics. Although grasp tracking or object manipulation synthesis can produce coarse hand motion, this kind of motion is inevitably noisy and full of jitter. To address this problem, we propose a data-driven method for coarse motion refinement. First, we design a hand-centric representation to describe the dynamic spatial-temporal relation between hands and objects. Compared to the object-centric representation, our hand-centric representation is straightforward and does not require an ambiguous projection process that converts object-based prediction into hand motion. Second, to capture the dynamic clues of hand-object interaction, we propose a new architecture that models the spatial and temporal structure in a hierarchical manner. Extensive experiments demonstrate that our method outperforms previous methods by a noticeable margin.

Published

2024-03-24

How to Cite

Hao, Y., Zhang, J., Zhuo, T., Wen, F., & Fan, H. (2024). Hand-Centric Motion Refinement for 3D Hand-Object Interaction via Hierarchical Spatial-Temporal Modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2076-2084. https://doi.org/10.1609/aaai.v38i3.27979

Issue

Section

AAAI Technical Track on Computer Vision II