When Instinct Guides and Insight Grounds: Staged RL Training for LLM Agents
DOI:
https://doi.org/10.1609/aaai.v40i41.40794Abstract
Large Language Model (LLM) agents have demonstrated strong potential in complex, interactive decision-making tasks. However, when training LLM agents end-to-end with reinforcement learning (RL), efficiently optimizing agent policies in dynamic environments remains a significant challenge. Existing RL-based LLM agent paradigms commonly organize interactions in a cycle where reasoning is followed by action. In our work, we observe a phenomenon we call Exploration Contraction, where the explicit introduction of a reasoning stage reduces the diversity of actions—quantified by lower action entropy—which in turn limits exploration and leads to premature policy convergence. To address this limitation, we propose Act-before-Reasoning (ActRe), a two-stage RL training framework. In the first stage, we reverse the typical rollout order, prompting the agent to generate actions prior to reasoning, which encourages exploration driven by model intuition. In the second stage, we restore the standard reasoning-then-action order for training and evaluation, ensuring robust and interpretable decision-making. Experiments on the ALFWorld and WebShop benchmarks show that ActRe effectively mitigates exploration contraction, yielding consistently higher task success rates and improved training robustness compared to strong RL baselines. Our analysis underscores the importance of action entropy in the exploration-exploitation trade-off during LLM agent training and provides a practical approach to maintain the benefits of explicit reasoning while promoting sufficient exploration.Downloads
Published
2026-03-14
How to Cite
Zhang, Z., & Zhang, B. (2026). When Instinct Guides and Insight Grounds: Staged RL Training for LLM Agents. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34906–34914. https://doi.org/10.1609/aaai.v40i41.40794
Issue
Section
AAAI Technical Track on Natural Language Processing VI