TY - JOUR AU - Luna Gutierrez, Ricardo AU - Leonetti, Matteo PY - 2021/05/17 Y2 - 2024/03/28 TI - Meta Reinforcement Learning for Heuristic Planing JF - Proceedings of the International Conference on Automated Planning and Scheduling JA - ICAPS VL - 31 IS - 1 SE - Special Track on Planning and Learning DO - 10.1609/icaps.v31i1.16003 UR - https://ojs.aaai.org/index.php/ICAPS/article/view/16003 SP - 551-559 AB - Heuristic planning has a central role in classical planning applications and competitions. Thanks to this success, there has been an increasing interest in using Deep Learning to create high-quality heuristics in a supervised fashion, learning from optimal solutions of previously solved planning problems. Meta-Reinforcement learning is a fast growing research area concerned with learning, from many tasks, behaviours that can quickly generalize to new tasks from the same distribution of the training ones. We make a connection between meta-reinforcement learning and heuristic planning, showing that heuristic functions meta-learned from planning problems, in a given domain, can outperform both popular domain-independent heuristics, and heuristics learned by supervised learning. Furthermore, while most supervised learning algorithms rely on ad-hoc encodings of the state representation, our method uses as input a general PDDL 3.1 description. We evaluated our heuristic with an A* planner on six domains from the International Planning Competition and the FF Domain Collection, showing that the meta-learned heuristic leads to the expansion, on average, of fewer states than three popular heuristics used by the FastDownward planner, and a supervised-learned heuristic. ER -