Offline Evaluation of Online Reinforcement Learning Algorithms

Authors

  • Travis Mandel University of Washington
  • Yun-En Liu Enlearn
  • Emma Brunskill Carnegie Mellon University
  • Zoran Popović University of Washington

DOI:

https://doi.org/10.1609/aaai.v30i1.10312

Keywords:

Offline Evaluation, Nonstationary Policy Evaluation, Unbiased estimator, Replayer, Exploration and Exploitation

Abstract

In many real-world reinforcement learning problems, we have access to an existing dataset and would like to use it to evaluate various learning approaches. Typically, one would prefer not to deploy a fixed policy, but rather an algorithm that learns to improve its behavior as it gains more experience. Therefore, we seek to evaluate how a proposed algorithm learns in our environment, meaning we need to evaluate how an algorithm would have gathered experience if it were run online. In this work, we develop three new evaluation approaches which guarantee that, given some history, algorithms are fed samples from the distribution that they would have encountered if they were run online. Additionally, we are the first to propose an approach that is provably unbiased given finite data, eliminating bias due to the length of the evaluation. Finally, we compare the sample-efficiency of these approaches on multiple datasets, including one from a real-world deployment of an educational game.

Downloads

Published

2016-02-21

How to Cite

Mandel, T., Liu, Y.-E., Brunskill, E., & Popović, Z. (2016). Offline Evaluation of Online Reinforcement Learning Algorithms. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10312

Issue

Section

Technical Papers: Machine Learning Methods