Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

Authors

  • Pranay Dugar Oregon State University
  • Aayam Shrestha Oregon State University
  • Fangzhou Yu Oregon State University
  • Bart van Marum Oregon State University
  • Alan Fern Oregon State University

DOI:

https://doi.org/10.1609/aaaiss.v7i1.36946

Abstract

A major challenge in humanoid robotics is designing a unified interface for commanding diverse whole-body behaviors, from precise footstep sequences to partial-body mimicry and joystick teleoperation. We introduce the Masked Humanoid Controller (MHC), a learned whole-body controller that exposes a simple yet expressive interface: the specification of masked target trajectories over selected subsets of the robot’s state variables. This unified abstraction allows high-level systems to issue commands in a flexible format that accommodates multi-modal inputs such as optimized trajectories, motion capture clips, re-targeted video, and real-time joystick signals. The MHC is trained in simulation using a curriculum that spans this full range of modalities, enabling robust execution of partially specified behaviors while maintaining balance and disturbance rejection. We demonstrate the MHC both in simulation and on the real-world Digit V3 humanoid, showing that a single learned controller is capable of executing such diverse whole-body commands in the real world through a common representational interface.

Downloads

Published

2025-11-23

How to Cite

Dugar, P., Shrestha, A., Yu, F., van Marum, B., & Fern, A. (2025). Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots. Proceedings of the AAAI Symposium Series, 7(1), 650-657. https://doi.org/10.1609/aaaiss.v7i1.36946

Issue

Section

Unifying Representations for Robot Application Development