Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

Authors

  • Sukai Huang The University of Melbourne
  • Trevor Cohn The University of Melbourne
  • Nir Lipovetzky The University of Melbourne

DOI:

https://doi.org/10.1609/icaps.v35i1.36119

Abstract

The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs' reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This paper revisits these claims by developing an end-to-end LLM-based planner and evaluating a range of reasoning-enhancement strategies --- including fine-tuning, Chain-of-Thought (CoT) prompting, and reinforcement learning (RL) --- across multiple dimensions of plan quality: validity, executability, goal satisfiability, and more. Our findings reveal fine-tuning alone is insufficient, especially on out-of-distribution tasks. Strategies like CoT prompting primarily enhance local coherence, yielding higher executability rates --- a necessary prerequisite for validity --- but provide only incremental gains and struggle to ensure global plan validity. Notably, RL guided by a novel Longest Contiguous Common Subsequence reward significantly enhances both executability and validity, particularly on longer-horizon problems. Overall, our research addresses key misconceptions in the LLM-planning literature and underscores reward-driven RL optimization as a promising direction for advancing robust LLM-based planning by jointly improving executability and validity.

Downloads

Published

2025-09-16

How to Cite

Huang, S., Cohn, T., & Lipovetzky, N. (2025). Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation. Proceedings of the International Conference on Automated Planning and Scheduling, 35(1), 204–212. https://doi.org/10.1609/icaps.v35i1.36119