Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Josiah Hanna; Peter Stone; Scott Niekum

doi:10.1609/aaai.v31i1.11123

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Authors

Josiah Hanna The University of Texas at Austin
Peter Stone The University of Texas at Austin
Scott Niekum The University of Texas at Austin

DOI:

https://doi.org/10.1609/aaai.v31i1.11123

Keywords:

high confidence off-policy evaluation, model-based reinforcement learning, bootstrapping

Abstract

In many reinforcement learning applications, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data. We empirically evaluate the proposed methods in a standard policy evaluation tasks.

Downloads

Published

2017-02-12

How to Cite

Hanna, J., Stone, P., & Niekum, S. (2017). Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11123

Download Citation

Issue

Vol. 31 No. 1 (2017): Thirty-First AAAI Conference on Artificial Intelligence

Section

Student Abstract Track

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription