Weighted Policy Constraints for Offline Reinforcement Learning

Authors

  • Zhiyong Peng National University of Defense Technology
  • Changlin Han National University of Defense Technology
  • Yadong Liu National University of Defense Technology
  • Zongtan Zhou National University of Defense Technology

DOI:

https://doi.org/10.1609/aaai.v37i8.26130

Keywords:

ML: Reinforcement Learning Algorithms, ML: Imitation Learning & Inverse Reinforcement Learning, ML: Optimization, ML: Reinforcement Learning Theory, PRS: Planning With Markov Models (MDPs, POMDPs)

Abstract

Offline reinforcement learning (RL) aims to learn policy from the passively collected offline dataset. Applying existing RL methods on the static dataset straightforwardly will raise distribution shift, causing these unconstrained RL methods to fail. To cope with the distribution shift problem, a common practice in offline RL is to constrain the policy explicitly or implicitly close to behavioral policy. However, the available dataset usually contains sub-optimal or inferior actions, constraining the policy near all these actions will make the policy inevitably learn inferior behaviors, limiting the performance of the algorithm. Based on this observation, we propose a weighted policy constraints (wPC) method that only constrains the learned policy to desirable behaviors, making room for policy improvement on other parts. Our algorithm outperforms existing state-of-the-art offline RL algorithms on the D4RL offline gym datasets. Moreover, the proposed algorithm is simple to implement with few hyper-parameters, making the proposed wPC algorithm a robust offline RL method with low computational complexity.

Downloads

Published

2023-06-26

How to Cite

Peng, Z., Han, C., Liu, Y., & Zhou, Z. (2023). Weighted Policy Constraints for Offline Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9435-9443. https://doi.org/10.1609/aaai.v37i8.26130

Issue

Section

AAAI Technical Track on Machine Learning III