Goal-Conditioned Generators of Deep Policies

Francesco Faccio; Vincent Herrmann; Aditya Ramesh; Louis Kirsch; Jürgen Schmidhuber

doi:10.1609/aaai.v37i6.25912

Authors

Francesco Faccio The Swiss AI Lab IDSIA AI Initiative, KAUST
Vincent Herrmann The Swiss AI Lab IDSIA
Aditya Ramesh The Swiss AI Lab IDSIA
Louis Kirsch The Swiss AI Lab IDSIA
Jürgen Schmidhuber The Swiss AI Lab IDSIA AI Initiative, KAUST NNAISENSE, Lugano, Switzerland

DOI:

https://doi.org/10.1609/aaai.v37i6.25912

Keywords:

ML: Reinforcement Learning Algorithms, ROB: Behavior Learning & Control, ROB: Learning & Optimization for ROB

Abstract

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form ``generate a policy that achieves a desired expected return,'' our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.

Goal-Conditioned Generators of Deep Policies

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information