User-Oriented Robust Reinforcement Learning

Authors

  • Haoyi You Shanghai Jiao Tong University
  • Beichen Yu Shanghai Jiao Tong University
  • Haiming Jin Shanghai Jiao Tong University
  • Zhaoxing Yang Shanghai Jiao Tong University
  • Jiahui Sun Shanghai Jiao Tong University

DOI:

https://doi.org/10.1609/aaai.v37i12.26781

Keywords:

General

Abstract

Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy’s performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 6 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average-case and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.

Downloads

Published

2023-06-26

How to Cite

You, H., Yu, B., Jin, H., Yang, Z., & Sun, J. (2023). User-Oriented Robust Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(12), 15269-15277. https://doi.org/10.1609/aaai.v37i12.26781

Issue

Section

AAAI Special Track on Safe and Robust AI