Differentiable Information Enhanced Model-Based Reinforcement Learning

Authors

  • Xiaoyuan Zhang Institute for Artificial Intelligence, Peking University State Key Laboratory of General Artificial Intelligence, Peking University, Beijing, China State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
  • Xinyan Cai Institute of automation, Chinese academy of science, Chinese Academy of Sciences
  • Bo Liu Institute for Artificial Intelligence, Peking University
  • Weidong Huang State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
  • Song-Chun Zhu State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China Institute for Artificial Intelligence, Peking University State Key Laboratory of General Artificial Intelligence, Peking University, Beijing, China
  • Siyuan Qi State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China
  • Yaodong Yang Institute for Artificial Intelligence, Peking University State Key Laboratory of General Artificial Intelligence, Peking University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i21.34419

Abstract

Differentiable environments have heralded new possibilities for learning control policies by offering rich differentiable information that facilitates gradient-based methods. In comparison to prevailing model-free reinforcement learning approaches, model-based reinforcement learning (MBRL) methods exhibit the potential to effectively harness the power of differentiable information for recovering the underlying physical dynamics. However, this presents two primary challenges: effectively utilizing differentiable information to 1) construct models with more accurate dynamic prediction and 2) enhance the stability of policy training. In this paper, we propose a Differentiable Information Enhanced MBRL method, MB-MIX, to address both challenges. Firstly, we adopt a Sobolev model training approach that penalizes incorrect model gradient outputs, enhancing prediction accuracy and yielding more precise models that faithfully capture system dynamics. Secondly, we introduce mixing lengths of truncated learning windows to reduce the variance in policy gradient estimation, resulting in improved stability during policy learning. To validate the effectiveness of our approach in differentiable environments, we provide theoretical analysis and empirical results. Notably, our approach outperforms previous model-based and model-free methods, in multiple challenging tasks involving controllable rigid robots such as humanoid robots' motion control and deformable object manipulation.

Downloads

Published

2025-04-11

How to Cite

Zhang, X., Cai, X., Liu, B., Huang, W., Zhu, S.-C., Qi, S., & Yang, Y. (2025). Differentiable Information Enhanced Model-Based Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 22605–22613. https://doi.org/10.1609/aaai.v39i21.34419

Issue

Section

AAAI Technical Track on Machine Learning VII