Adaptive Step-Size for Online Temporal Difference Learning

Authors

  • William Dabney University of Massachusetts Amherst
  • Andrew Barto University of Massachusetts Amherst

DOI:

https://doi.org/10.1609/aaai.v26i1.8313

Keywords:

reinforcement learning, adaptive step-size, temporal difference learning, step-size, learning rate

Abstract

The step-size, often denoted as α, is a key parameter for most incremental learning algorithms. Its importance is especially pronounced when performing online temporal difference (TD) learning with function approximation. Several methods have been developed to adapt the step-size online. These range from straightforward back-off strategies to adaptive algorithms based on gradient descent. We derive an adaptive upper bound on the step-size parameter to guarantee that online TD learning with linear function approximation will not diverge. We then empirically evaluate algorithms using this upper bound as a heuristic for adapting the step-size parameter online. We compare performance with related work including HL(λ) and Autostep. Our results show that this adaptive upper bound heuristic out-performs all existing methods without requiring any meta-parameters. This effectively eliminates the need to tune the learning rate of temporal difference learning with linear function approximation.

Downloads

Published

2021-09-20

How to Cite

Dabney, W., & Barto, A. (2021). Adaptive Step-Size for Online Temporal Difference Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 26(1), 872-878. https://doi.org/10.1609/aaai.v26i1.8313

Issue

Section

AAAI Technical Track: Machine Learning