Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation
DOI:
https://doi.org/10.1609/aaai.v31i1.11123Keywords:
high confidence off-policy evaluation, model-based reinforcement learning, bootstrappingAbstract
In many reinforcement learning applications, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data. We empirically evaluate the proposed methods in a standard policy evaluation tasks.
Downloads
Published
2017-02-12
How to Cite
Hanna, J., Stone, P., & Niekum, S. (2017). Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.11123
Issue
Section
Student Abstract Track