Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

Authors

  • Yingnan Zhao College of Computer Science and Technology, Harbin Engineering University National Engineering Laboratory for Modeling and Emulation in E-Government, Harbin Engineering University
  • Xinmiao Wang College of Computer Science and Technology, Harbin Engineering University Institute of Artificial Intelligence (TeleAI), China Telecom
  • Dewei Wang Institute of Artificial Intelligence (TeleAI), China Telecom School of Information Science and Technology, University of Science and Technology of China
  • Xinzhe Liu Institute of Artificial Intelligence (TeleAI), China Telecom School of Information Science and Technology, ShanghaiTech University
  • Dan Lu College of Computer Science and Technology, Harbin Engineering University National Engineering Laboratory for Modeling and Emulation in E-Government, Harbin Engineering University
  • Qilong Han College of Computer Science and Technology, Harbin Engineering University National Engineering Laboratory for Modeling and Emulation in E-Government, Harbin Engineering University
  • Peng Liu College of Computer Science and Technology, Harbin Institute of Technology
  • Chenjia Bai Institute of Artificial Intelligence (TeleAI), China Telecom Shenzhen Research Institute of Northwestern Polytechnical University

DOI:

https://doi.org/10.1609/aaai.v40i22.38951

Abstract

Humanoid robots are promising to learn a diverse set of human-like locomotion behaviors, including standing up, walking, running, and jumping. However, existing methods predominantly require training independent policies for each skill, yielding behavior-specific controllers that exhibit limited generalization and brittle performance when deployed on irregular terrains and in diverse situations. To address this challenge, we propose Adaptive Humanoid Control (AHC) that adopts a two-stage framework to learn an adaptive humanoid locomotion controller across different skills and terrains. Specifically, we first train several primary locomotion policies and perform a multi-behavior distillation process to obtain a basic multi-behavior controller, facilitating adaptive behavior switching based on the environment. Then, we perform reinforced fine-tuning by collecting online feedback in performing adaptive behaviors on more diverse terrains, enhancing terrain adaptability for the adaptive behavior controller. We conduct experiments in both simulation and real-world experiments in Unitree G1 robots. The results show that our method exhibits strong adaptability across various situations and terrains.

Published

2026-03-14

How to Cite

Zhao, Y., Wang, X., Wang, D., Liu, X., Lu, D., Han, Q., … Bai, C. (2026). Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(22), 18818–18826. https://doi.org/10.1609/aaai.v40i22.38951

Issue

Section

AAAI Technical Track on Intelligent Robotics