Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation

Authors

  • Hengrui Zhang School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China
  • Youfang Lin School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China
  • Shuo Shen Cooperation Product Department, Interactive Entertainment Group, Tencent
  • Sheng Han School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China
  • Kai Lv School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i19.30177

Keywords:

General

Abstract

In the domain of real-world agents, the application of Reinforcement Learning (RL) remains challenging due to the necessity for safety constraints. Previously, Constrained Reinforcement Learning (CRL) has predominantly focused on on-policy algorithms. Although these algorithms exhibit a degree of efficacy, their interactivity efficiency in real-world settings is sub-optimal, highlighting the demand for more efficient off-policy methods. However, off-policy CRL algorithms grapple with challenges in precise estimation of the C-function, particularly due to the fluctuations in the constrained Lagrange multiplier. Addressing this gap, our study focuses on the nuances of C-value estimation in off-policy CRL and introduces the Adaptive Ensemble C-learning (AEC) approach to reduce these inaccuracies. Building on state-of-the-art off-policy algorithms, we propose AEC-based CRL algorithms designed for enhanced task optimization. Extensive experiments on nine constrained robotics tasks reveal the superior interaction efficiency and performance of our algorithms in comparison to preceding methods.

Published

2024-03-24

How to Cite

Zhang, H., Lin, Y., Shen, S., Han, S., & Lv, K. (2024). Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(19), 21770-21778. https://doi.org/10.1609/aaai.v38i19.30177

Issue

Section

AAAI Technical Track on Safe, Robust and Responsible AI Track