Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics

Xueyuan Yang; Chao Yao; Xiaojuan Ban

doi:10.1609/aaai.v38i9.28888

Authors

Xueyuan Yang Beijing Advanced Innovation Center for Materials Genome Engineering, Beijing 100083, China. University of Science and Technology Beijing, Beijing 100083, China.
Chao Yao Beijing Advanced Innovation Center for Materials Genome Engineering, Beijing 100083, China. University of Science and Technology Beijing, Beijing 100083, China.
Xiaojuan Ban Beijing Advanced Innovation Center for Materials Genome Engineering, Beijing 100083, China. University of Science and Technology Beijing, Beijing 100083, China. Key Laboratory of Intelligent Bionic Unmanned Systems, Ministry of Education, Beijing 100083, China. Institute of Materials Intelligent Technology, Liaoning Academy of Materials, Shenyang 110004, China.

DOI:

https://doi.org/10.1609/aaai.v38i9.28888

Keywords:

HAI: Human-Computer Interaction, HAI: Other Foundations of Human Computation & AI

Abstract

Leveraging wearable devices for motion reconstruction has emerged as an economical and viable technique. Certain methodologies employ sparse Inertial Measurement Units (IMUs) on the human body and harness data-driven strategies to model human poses. However, the reconstruction of motion based solely on sparse IMU data is inherently fraught with ambiguity, a consequence of numerous identical IMU readings corresponding to different poses. In this paper, we explore the spatial importance of sparse sensors, supervised by text that describes specific actions. Specifically, uncertainty is introduced to derive weighted features for each IMU. We also design a Hierarchical Temporal Transformer (HTT) and apply contrastive learning to achieve precise temporal and feature alignment of sensor data with textual semantics. Experimental results demonstrate our proposed approach achieves significant improvements in multiple metrics compared to existing methods. Notably, with textual supervision, our method not only differentiates between ambiguous actions such as sitting and standing but also produces more precise and natural motion.

Spatial-Related Sensors Matters: 3D Human Motion Reconstruction Assisted with Textual Semantics

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription