Optimistic Initialization for Exploration in Continuous Control

Authors

  • Sam Lobel Brown University
  • Omer Gottesman Brown University
  • Cameron Allen Brown University
  • Akhil Bagaria Brown University
  • George Konidaris Brown University

DOI:

https://doi.org/10.1609/aaai.v36i7.20727

Keywords:

Machine Learning (ML)

Abstract

Optimistic initialization underpins many theoretically sound exploration schemes in tabular domains; however, in the deep function approximation setting, optimism can quickly disappear if initialized naively. We propose a framework for more effectively incorporating optimistic initialization into reinforcement learning for continuous control. Our approach uses metric information about the state-action space to estimate which transitions are still unexplored, and explicitly maintains the initial Q-value optimism for the corresponding state-action pairs. We also develop methods for efficiently approximating these training objectives, and for incorporating domain knowledge into the optimistic envelope to improve sample efficiency. We empirically evaluate these approaches on a variety of hard exploration problems in continuous control, where our method outperforms existing exploration techniques.

Downloads

Published

2022-06-28

How to Cite

Lobel, S., Gottesman, O., Allen, C., Bagaria, A., & Konidaris, G. (2022). Optimistic Initialization for Exploration in Continuous Control. Proceedings of the AAAI Conference on Artificial Intelligence, 36(7), 7612-7619. https://doi.org/10.1609/aaai.v36i7.20727

Issue

Section

AAAI Technical Track on Machine Learning II