Generalized Planning in PDDL Domains with Pretrained Large Language Models

Authors

  • Tom Silver MIT Computer Science and Artificial Intelligence Laboratory
  • Soham Dan IBM Research
  • Kavitha Srinivas IBM Research
  • Joshua B. Tenenbaum MIT Computer Science and Artificial Intelligence Laboratory
  • Leslie Kaelbling MIT Computer Science and Artificial Intelligence Laboratory
  • Michael Katz IBM Research

DOI:

https://doi.org/10.1609/aaai.v38i18.30006

Keywords:

PRS: Planning with Language Models

Abstract

Recent work has considered whether large language models (LLMs) can function as planners: given a task, generate a plan. We investigate whether LLMs can serve as generalized planners: given a domain and training tasks, generate a program that efficiently produces plans for other tasks in the domain. In particular, we consider PDDL domains and use GPT-4 to synthesize Python programs. We also consider (1) Chain-of-Thought (CoT) summarization, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program; and (2) automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. We also conclude that automated debugging is very important, that CoT summarization has non-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just two training tasks are often sufficient for strong generalization.

Published

2024-03-24

How to Cite

Silver, T., Dan, S., Srinivas, K., Tenenbaum, J. B., Kaelbling, L., & Katz, M. (2024). Generalized Planning in PDDL Domains with Pretrained Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 38(18), 20256-20264. https://doi.org/10.1609/aaai.v38i18.30006

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling