Off-Policy Proximal Policy Optimization

Authors

  • Wenjia Meng School of Software, Shandong University, Jinan, China
  • Qian Zheng The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China College of Computer Science and Technology, Zhejiang University, Hangzhou, China
  • Gang Pan The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China College of Computer Science and Technology, Zhejiang University, Hangzhou, China
  • Yilong Yin School of Software, Shandong University, Jinan, China

DOI:

https://doi.org/10.1609/aaai.v37i8.26099

Keywords:

ML: Reinforcement Learning Algorithms, PRS: Control of High-Dimensional Systems, RU: Sequential Decision Making

Abstract

Proximal Policy Optimization (PPO) is an important reinforcement learning method, which has achieved great success in sequential decision-making problems. However, PPO faces the issue of sample inefficiency, which is due to the PPO cannot make use of off-policy data. In this paper, we propose an Off-Policy Proximal Policy Optimization method (Off-Policy PPO) that improves the sample efficiency of PPO by utilizing off-policy data. Specifically, we first propose a clipped surrogate objective function that can utilize off-policy data and avoid excessively large policy updates. Next, we theoretically clarify the stability of the optimization process of the proposed surrogate objective by demonstrating the degree of policy update distance is consistent with that in the PPO. We then describe the implementation details of the proposed Off-Policy PPO which iteratively updates policies by optimizing the proposed clipped surrogate objective. Finally, the experimental results on representative continuous control tasks validate that our method outperforms the state-of-the-art methods on most tasks.

Downloads

Published

2023-06-26

How to Cite

Meng, W., Zheng, Q., Pan, G., & Yin, Y. (2023). Off-Policy Proximal Policy Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9162-9170. https://doi.org/10.1609/aaai.v37i8.26099

Issue

Section

AAAI Technical Track on Machine Learning III