Differential Eligibility Vectors for Advantage Updating and Gradient Methods

Authors

  • Francisco Melo Instituto Superior Técnico/INESC-ID

Abstract

In this paper we propose differential eligibility vectors (DEV) for temporal-difference (TD) learning, a new class of eligibility vectors designed to bring out the contribution of each action in the TD-error at each state. Specifically, we use DEV in TD-Q(lambda) to more accurately learn the relative value of the actions, rather than their absolute value. We identify conditions that ensure convergence w.p.1 of TD-Q(lambda) with DEV and show that this algorithm can also be used to directly approximate the advantage function associated with a given policy, without the need to compute an auxiliary function - something that, to the extent of our knowledge, was not known possible. Finally, we discuss the integration of DEV in LSTDQ and actor-critic algorithms.

Downloads

Published

2011-08-04

How to Cite

Melo, F. (2011). Differential Eligibility Vectors for Advantage Updating and Gradient Methods. Proceedings of the AAAI Conference on Artificial Intelligence, 25(1), 441-446. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/7938

Issue

Section

AAAI Technical Track: Machine Learning