Goal-Conditioned Generators of Deep Policies


  • Francesco Faccio The Swiss AI Lab IDSIA AI Initiative, KAUST
  • Vincent Herrmann The Swiss AI Lab IDSIA
  • Aditya Ramesh The Swiss AI Lab IDSIA
  • Louis Kirsch The Swiss AI Lab IDSIA
  • Jürgen Schmidhuber The Swiss AI Lab IDSIA AI Initiative, KAUST NNAISENSE, Lugano, Switzerland




ML: Reinforcement Learning Algorithms, ROB: Behavior Learning & Control, ROB: Learning & Optimization for ROB


Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form ``generate a policy that achieves a desired expected return,'' our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.




How to Cite

Faccio, F., Herrmann, V., Ramesh, A., Kirsch, L., & Schmidhuber, J. (2023). Goal-Conditioned Generators of Deep Policies. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6), 7503-7511. https://doi.org/10.1609/aaai.v37i6.25912



AAAI Technical Track on Machine Learning I