High-Confidence Off-Policy Evaluation

Philip Thomas; Georgios Theocharous; Mohammad Ghavamzadeh

doi:10.1609/aaai.v29i1.9541

High-Confidence Off-Policy Evaluation

Authors

Philip Thomas University of Massachusetts, Amherst
Georgios Theocharous Adobe Research
Mohammad Ghavamzadeh Adobe Research

DOI:

https://doi.org/10.1609/aaai.v29i1.9541

Keywords:

policy evaluation, high-confidence, concentration inequality

Abstract

Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy.

Downloads

Published

2015-02-21

How to Cite

Thomas, P., Theocharous, G., & Ghavamzadeh, M. (2015). High-Confidence Off-Policy Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 29(1). https://doi.org/10.1609/aaai.v29i1.9541

Download Citation

Issue

Vol. 29 No. 1 (2015): Twenty-Ninth AAAI Conference on Artificial Intelligence

Section

Main Track: Novel Machine Learning Algorithms

High-Confidence Off-Policy Evaluation

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information