KineST: A Kinematics-guided Spatiotemporal State Space Model for Human Motion Tracking from Sparse Signals

Authors

  • Shuting Zhao College of Biomedical Engineering, Fudan University Shanghai Key Laboratory of Medical Image Computing and Computer
  • Zeyu Xiao College of Intelligent Robotics and Advanced Manufacturing, Fudan University
  • Xinrong Chen College of Biomedical Engineering, Fudan University Shanghai Key Laboratory of Medical Image Computing and Computer

DOI:

https://doi.org/10.1609/aaai.v40i16.38326

Abstract

Full-body motion tracking plays an essential role in AR/VR applications, bridging physical and virtual interactions. However, it is challenging to reconstruct realistic and diverse full-body poses based on sparse signals obtained by head-mounted displays, which are the main devices in AR/VR scenarios. Existing methods for pose reconstruction often incur high computational costs or rely on separately modeling spatial and temporal dependencies, making it difficult to balance accuracy, temporal coherence, and efficiency. To address this problem, we propose KineST, a novel kinematics-guided state space model, which effectively extracts spatiotemporal dependencies while integrating local and global pose perception. The innovation comes from two core ideas. Firstly, in order to better capture intricate joint relationships, the scanning strategy within the State Space Duality framework is reformulated into kinematics-guided bidirectional scanning, which embeds kinematic priors. Secondly, a mixed spatiotemporal representation learning approach is employed to tightly couple spatial and temporal contexts, balancing accuracy and smoothness. Additionally, a geometric angular velocity loss is introduced to impose physically meaningful constraints on rotational variations for further improving motion stability. Extensive experiments demonstrate that KineST has superior performance in both accuracy and temporal consistency within a lightweight framework.

Downloads

Published

2026-03-14

How to Cite

Zhao, S., Xiao, Z., & Chen, X. (2026). KineST: A Kinematics-guided Spatiotemporal State Space Model for Human Motion Tracking from Sparse Signals. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13244–13252. https://doi.org/10.1609/aaai.v40i16.38326

Issue

Section

AAAI Technical Track on Computer Vision XIII