Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

Sukai Huang; Trevor Cohn; Nir Lipovetzky

doi:10.1609/icaps.v35i1.36119

Authors

Sukai Huang The University of Melbourne
Trevor Cohn The University of Melbourne
Nir Lipovetzky The University of Melbourne

DOI:

https://doi.org/10.1609/icaps.v35i1.36119

Abstract

The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs' reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This paper revisits these claims by developing an end-to-end LLM-based planner and evaluating a range of reasoning-enhancement strategies --- including fine-tuning, Chain-of-Thought (CoT) prompting, and reinforcement learning (RL) --- across multiple dimensions of plan quality: validity, executability, goal satisfiability, and more. Our findings reveal fine-tuning alone is insufficient, especially on out-of-distribution tasks. Strategies like CoT prompting primarily enhance local coherence, yielding higher executability rates --- a necessary prerequisite for validity --- but provide only incremental gains and struggle to ensure global plan validity. Notably, RL guided by a novel Longest Contiguous Common Subsequence reward significantly enhances both executability and validity, particularly on longer-horizon problems. Overall, our research addresses key misconceptions in the LLM-planning literature and underscores reward-driven RL optimization as a promising direction for advancing robust LLM-based planning by jointly improving executability and validity.

Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information