Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

David Kuric; Guillermo Infante; Vicenç Gómez; Anders Jonsson; Herke van Hoof

doi:10.1609/icaps.v34i1.31492

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Authors

David Kuric University of Amsterdam
Guillermo Infante Universitat Pompeu Fabra
Vicenç Gómez Universitat Pompeu Fabra
Anders Jonsson Universitat Pompeu Fabra
Herke van Hoof University of Amsterdam

DOI:

https://doi.org/10.1609/icaps.v34i1.31492

Abstract

Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a set of local policies that each solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these local policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine local policies via planning, our method asymptotically attains global optimality, even in stochastic environments.

Downloads

Published

2024-05-30

How to Cite

Kuric, D., Infante, G., Gómez, V., Jonsson, A., & van Hoof, H. (2024). Planning with a Learned Policy Basis to Optimally Solve Complex Tasks. Proceedings of the International Conference on Automated Planning and Scheduling, 34(1), 333-341. https://doi.org/10.1609/icaps.v34i1.31492

Download Citation

Issue

Vol. 34 (2024): Proceedings of the Thirty-Fourth International Conference on Automated Planning and Scheduling

Section

Main Track

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information