Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

Authors

  • Assaf Hallak Technion Institute of Technology
  • Aviv Tamar University of California, Berkeley
  • Remi Munos Google DeepMind
  • Shie Mannor Technion Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v30i1.10227

Keywords:

Off-policy Evaluation, Emphatic Temporal Differences

Abstract

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced emphatic temporal differences (ETD) algorithm, which encompasses the original ETD(λ), as well as several other off-policy evaluation algorithms as special cases. We call this framework ETD(λ, β), where our introduced parameter β controls the decay rate of an importance-sampling term. We study conditions under which the projected fixed-point equation underlying ETD(λ, β) involves a contraction operator, allowing us to present the first asymptotic error bounds (bias) for ETD(λ, β). Our results show that the original ETD algorithm always involves a contraction operator, and its bias is bounded. Moreover, by controlling β, our proposed generalization allows trading-off bias for variance reduction, thereby achieving a lower total error.

Downloads

Published

2016-02-21

How to Cite

Hallak, A., Tamar, A., Munos, R., & Mannor, S. (2016). Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10227

Issue

Section

Technical Papers: Machine Learning Methods