Expressive Forecasting of 3D Whole-Body Human Motions

Authors

  • Pengxiang Ding MiLAB, Westlake University Zhejiang University
  • Qiongjie Cui Nanjing University of Science and Technology Xiaohongshu Inc.
  • Haofan Wang Xiaohongshu Inc.
  • Min Zhang MiLAB, Westlake University Zhejiang University
  • Mengyuan Liu Shenzhen Graduate School, Peking University
  • Donglin Wang Westlake University

DOI:

https://doi.org/10.1609/aaai.v38i2.27919

Keywords:

CV: Motion & Tracking, CV: Biometrics, Face, Gesture & Pose, ROB: Human-Robot Interaction

Abstract

Human motion forecasting, with the goal of estimating future human behavior over a period of time, is a fundamental task in many real-world applications. However, existing works typically concentrate on foretelling the major joints of the human body without considering the delicate movements of the human hands. In practical applications, hand gesture plays an important role in human communication with the real world, and expresses the primary intention of human beings. In this work, we are the first to formulate whole-body human pose forecasting task, which jointly predicts future both body and gesture activities. Correspondingly, we propose a novel Encoding-Alignment-Interaction (EAI) framework that aims to predict both coarse (body joints) and fine-grained (gestures) activities collaboratively, enabling expressive and cross-facilitated forecasting of 3D whole-body human motions. Specifically, our model involves two key constituents: cross-context alignment (XCA) and cross-context interaction (XCI). Considering the heterogeneous information within the whole-body, XCA aims to align the latent features of various human components, while XCI focuses on effectively capturing the context interaction among the human components. We conduct extensive experiments on a newly-introduced large-scale benchmark and achieve state-of-the-art performance. The code is public for research purposes at https://github.com/Dingpx/EAI.

Published

2024-03-24

How to Cite

Ding, P., Cui, Q., Wang, H., Zhang, M., Liu, M., & Wang, D. (2024). Expressive Forecasting of 3D Whole-Body Human Motions. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1537–1545. https://doi.org/10.1609/aaai.v38i2.27919

Issue

Section

AAAI Technical Track on Computer Vision I