Model-Based Offline Weighted Policy Optimization (Student Abstract)

Renzhe Zhou; Zongzhang Zhang; Yang Yu

doi:10.1609/aaai.v37i13.27056

Authors

Renzhe Zhou Nanjing University
Zongzhang Zhang Nanjing University
Yang Yu Nanjing University

DOI:

https://doi.org/10.1609/aaai.v37i13.27056

Keywords:

Offline Reinforcement Learning, Model-Based Reinforcement Learning, Weighted Bellman Update

Abstract

A promising direction for applying reinforcement learning to the real world is learning from offline datasets. Offline reinforcement learning aims to learn policies from pre-collected datasets without online interaction with the environment. Due to the lack of further interaction, offline reinforcement learning faces severe extrapolation error, leading to policy learning failure. In this paper, we investigate the weighted Bellman update in model-based offline reinforcement learning. We explore uncertainty estimation in ensemble dynamics models, then use a variational autoencoder to fit the behavioral prior, and finally propose an algorithm called Model-Based Offline Weighted Policy Optimization (MOWPO), which uses a combination of model confidence and behavioral prior as weights to reduce the impact of inaccurate samples on policy optimization. Experiment results show that MOWPO achieves better performance than state-of-the-art algorithms, and both the model confidence weight and the behavioral prior weight can play an active role in offline policy optimization.

Model-Based Offline Weighted Policy Optimization (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription