Exploration via Epistemic Value Estimation

Authors

  • Simon Schmitt DeepMind University College London
  • John Shawe-Taylor University College London
  • Hado van Hasselt DeepMind

DOI:

https://doi.org/10.1609/aaai.v37i8.26164

Keywords:

ML: Reinforcement Learning Algorithms, ML: Reinforcement Learning Theory

Abstract

How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions -- for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.

Downloads

Published

2023-06-26

How to Cite

Schmitt, S., Shawe-Taylor, J., & van Hasselt, H. (2023). Exploration via Epistemic Value Estimation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(8), 9742-9751. https://doi.org/10.1609/aaai.v37i8.26164

Issue

Section

AAAI Technical Track on Machine Learning III