Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Authors

  • Clement Gehring MIT
  • Masataro Asai MIT-IBM Watson AI Lab
  • Rohan Chitnis MIT
  • Tom Silver MIT
  • Leslie Kaelbling MIT
  • Shirin Sohrabi IBM Research
  • Michael Katz IBM Research

Keywords:

Heuristic Learning, Reinforcement Learning, Reward Shaping

Abstract

Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain. The source code and the appendix are available at github.com/ibm/pddlrl and arxiv.org/abs/2109.14830.

Downloads

Published

2022-06-13

How to Cite

Gehring, C., Asai, M., Chitnis, R., Silver, T., Kaelbling, L., Sohrabi, S., & Katz, M. (2022). Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators. Proceedings of the International Conference on Automated Planning and Scheduling, 32(1), 588-596. Retrieved from https://ojs.aaai.org/index.php/ICAPS/article/view/19846