Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation
DOI:
https://doi.org/10.1609/icaps.v35i1.36119Abstract
The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs' reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This paper revisits these claims by developing an end-to-end LLM-based planner and evaluating a range of reasoning-enhancement strategies --- including fine-tuning, Chain-of-Thought (CoT) prompting, and reinforcement learning (RL) --- across multiple dimensions of plan quality: validity, executability, goal satisfiability, and more. Our findings reveal fine-tuning alone is insufficient, especially on out-of-distribution tasks. Strategies like CoT prompting primarily enhance local coherence, yielding higher executability rates --- a necessary prerequisite for validity --- but provide only incremental gains and struggle to ensure global plan validity. Notably, RL guided by a novel Longest Contiguous Common Subsequence reward significantly enhances both executability and validity, particularly on longer-horizon problems. Overall, our research addresses key misconceptions in the LLM-planning literature and underscores reward-driven RL optimization as a promising direction for advancing robust LLM-based planning by jointly improving executability and validity.Downloads
Published
2025-09-16
How to Cite
Huang, S., Cohn, T., & Lipovetzky, N. (2025). Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation. Proceedings of the International Conference on Automated Planning and Scheduling, 35(1), 204–212. https://doi.org/10.1609/icaps.v35i1.36119
Issue
Section
Algorithmic papers