TY - JOUR AU - Thomas, Philip AU - Theocharous, Georgios AU - Ghavamzadeh, Mohammad PY - 2015/02/21 Y2 - 2024/03/29 TI - High-Confidence Off-Policy Evaluation JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 29 IS - 1 SE - Main Track: Novel Machine Learning Algorithms DO - 10.1609/aaai.v29i1.9541 UR - https://ojs.aaai.org/index.php/AAAI/article/view/9541 SP - AB - <p> Many reinforcement learning algorithms use trajectories collected from the execution of one or more policies to propose a new policy. Because execution of a bad policy can be costly or dangerous, techniques for evaluating the performance of the new policy without requiring its execution have been of recent interest in industry. Such off-policy evaluation methods, which estimate the performance of a policy using trajectories collected from the execution of other policies, heretofore have not provided confidences regarding the accuracy of their estimates. In this paper we propose an off-policy method for computing a lower confidence bound on the expected return of a policy. </p> ER -