Planning and Learning with Adaptive Lookahead

Authors

  • Aviv Rosenberg Amazon Science
  • Assaf Hallak Nvidia Research
  • Shie Mannor Nvidia Research Technion
  • Gal Chechik Nvidia Research Bar-Ilan University
  • Gal Dalal Nvidia Research

DOI:

https://doi.org/10.1609/aaai.v37i8.26149

Keywords:

ML: Reinforcement Learning Theory, ML: Reinforcement Learning Algorithms

Abstract

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

Downloads

Published

2023-06-26

How to Cite

Rosenberg, A., Hallak, A., Mannor, S., Chechik, G., & Dalal, G. (2023). Planning and Learning with Adaptive Lookahead. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9606-9613. https://doi.org/10.1609/aaai.v37i8.26149

Issue

Section

AAAI Technical Track on Machine Learning III