Q-functionals for Value-Based Continuous Control

Authors

  • Samuel Lobel Brown University
  • Sreehari Rammohan Brown University
  • Bowen He Brown University
  • Shangqun Yu University of Massachusetts, Amherst
  • George Konidaris Brown University

DOI:

https://doi.org/10.1609/aaai.v37i7.26073

Keywords:

ML: Reinforcement Learning Algorithms

Abstract

We present Q-functionals, an alternative architecture for continuous control deep reinforcement learning. Instead of returning a single value for a state-action pair, our network transforms a state into a function that can be rapidly evaluated in parallel for many actions, allowing us to efficiently choose high-value actions through sampling. This contrasts with the typical architecture of off-policy continuous control, where a policy network is trained for the sole purpose of selecting actions from the Q-function. We represent our action-dependent Q-function as a weighted sum of basis functions (Fourier, Polynomial, etc) over the action space, where the weights are state-dependent and output by the Q-functional network. Fast sampling makes practical a variety of techniques that require Monte-Carlo integration over Q-functions, and enables action-selection strategies besides simple value-maximization. We characterize our framework, describe various implementations of Q-functionals, and demonstrate strong performance on a suite of continuous control tasks.

Downloads

Published

2023-06-26

How to Cite

Lobel, S., Rammohan, S., He, B., Yu, S., & Konidaris, G. (2023). Q-functionals for Value-Based Continuous Control. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 8932-8939. https://doi.org/10.1609/aaai.v37i7.26073

Issue

Section

AAAI Technical Track on Machine Learning II