Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation

Hengrui Zhang; Youfang Lin; Shuo Shen; Sheng Han; Kai Lv

doi:10.1609/aaai.v38i19.30177

Authors

Hengrui Zhang School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China
Youfang Lin School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China
Shuo Shen Cooperation Product Department, Interactive Entertainment Group, Tencent
Sheng Han School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China
Kai Lv School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China Beijing Key Laboratory of Traffic Data Analysis and Mining, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i19.30177

Keywords:

General

Abstract

In the domain of real-world agents, the application of Reinforcement Learning (RL) remains challenging due to the necessity for safety constraints. Previously, Constrained Reinforcement Learning (CRL) has predominantly focused on on-policy algorithms. Although these algorithms exhibit a degree of efficacy, their interactivity efficiency in real-world settings is sub-optimal, highlighting the demand for more efficient off-policy methods. However, off-policy CRL algorithms grapple with challenges in precise estimation of the C-function, particularly due to the fluctuations in the constrained Lagrange multiplier. Addressing this gap, our study focuses on the nuances of C-value estimation in off-policy CRL and introduces the Adaptive Ensemble C-learning (AEC) approach to reduce these inaccuracies. Building on state-of-the-art off-policy algorithms, we propose AEC-based CRL algorithms designed for enhanced task optimization. Extensive experiments on nine constrained robotics tasks reveal the superior interaction efficiency and performance of our algorithms in comparison to preceding methods.

Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription