RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation

Authors

  • Xuetao Li Wuhan University
  • Wenke Huang Wuhan University
  • Nengyuan Pan Hubei University
  • Kaiyan Zhao Wuhan University
  • Songhua Yang Wuhan University
  • Yiming Wang University of Macau
  • Mengde Li Wuhan University
  • Mang Ye Wuhan University
  • Jifeng Xuan Wuhan University
  • Miao Li Wuhan University

DOI:

https://doi.org/10.1609/aaai.v40i18.38539

Abstract

Humanoid robots exhibit significant potential in executing diverse human-level skills. However, current research predominantly relies on data-driven approaches that necessitate extensive training datasets to achieve robust multimodal decision-making capabilities and generalizable visuomotor control. These methods raise concerns due to the neglect of geometric reasoning in unseen scenarios and the inefficient modeling of robot-target relationships within the training data, resulting in a significant waste of training resources. To address these limitations, we present the Recurrent Geometric-prior Multimodal Policy (RGMP), an end-to-end framework that unifies geometric-semantic skill reasoning with data-efficient visuomotor control. For perception capabilities, we propose the Geometric-prior Skill Selector, which infuses geometric inductive biases into a vision language model, producing adaptive skill sequences for unseen scenes with minimal spatial common sense tuning. To achieve data-efficient robotic motion synthesis, we introduce the Adaptive Recursive Gaussian Network, which parameterizes robot-object interactions as a compact hierarchy of Gaussian processes that recursively encode multi-scale spatial relationships, yielding dexterous, data-efficient motion synthesis even from sparse demonstrations. Evaluated on both our humanoid robot and desktop robot, the RGMP framework achieves 87% task success in generalization tests and exhibits 5× greater data efficiency than the state-of-the-art model. This performance underscores its superior cross-domain generalization, paving the way for more versatile and data-efficient robotic systems.

Published

2026-03-14

How to Cite

Li, X., Huang, W., Pan, N., Zhao, K., Yang, S., Wang, Y., … Li, M. (2026). RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(18), 15153–15161. https://doi.org/10.1609/aaai.v40i18.38539

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management II