Off-Policy Evaluation in Partially Observable Environments

Guy Tennenholtz; Uri Shalit; Shie Mannor

doi:10.1609/aaai.v34i06.6590

Off-Policy Evaluation in Partially Observable Environments

Authors

Guy Tennenholtz Technion
Uri Shalit Technion
Shie Mannor Technion

DOI:

https://doi.org/10.1609/aaai.v34i06.6590

Abstract

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.

Downloads

Published

2020-04-03

How to Cite

Tennenholtz, G., Shalit, U., & Mannor, S. (2020). Off-Policy Evaluation in Partially Observable Environments. Proceedings of the AAAI Conference on Artificial Intelligence, 34(06), 10276–10283. https://doi.org/10.1609/aaai.v34i06.6590

Download Citation

Issue

Vol. 34 No. 06: AAAI-20 Technical Tracks 6

Section

AAAI Technical Track: Reasoning under Uncertainty

Off-Policy Evaluation in Partially Observable Environments

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information