Weighted Policy Constraints for Offline Reinforcement Learning
DOI:
https://doi.org/10.1609/aaai.v37i8.26130Keywords:
ML: Reinforcement Learning Algorithms, ML: Imitation Learning & Inverse Reinforcement Learning, ML: Optimization, ML: Reinforcement Learning Theory, PRS: Planning With Markov Models (MDPs, POMDPs)Abstract
Offline reinforcement learning (RL) aims to learn policy from the passively collected offline dataset. Applying existing RL methods on the static dataset straightforwardly will raise distribution shift, causing these unconstrained RL methods to fail. To cope with the distribution shift problem, a common practice in offline RL is to constrain the policy explicitly or implicitly close to behavioral policy. However, the available dataset usually contains sub-optimal or inferior actions, constraining the policy near all these actions will make the policy inevitably learn inferior behaviors, limiting the performance of the algorithm. Based on this observation, we propose a weighted policy constraints (wPC) method that only constrains the learned policy to desirable behaviors, making room for policy improvement on other parts. Our algorithm outperforms existing state-of-the-art offline RL algorithms on the D4RL offline gym datasets. Moreover, the proposed algorithm is simple to implement with few hyper-parameters, making the proposed wPC algorithm a robust offline RL method with low computational complexity.Downloads
Published
2023-06-26
How to Cite
Peng, Z., Han, C., Liu, Y., & Zhou, Z. (2023). Weighted Policy Constraints for Offline Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9435-9443. https://doi.org/10.1609/aaai.v37i8.26130
Issue
Section
AAAI Technical Track on Machine Learning III