Zhang, Z., & Zhang, B. (2026). When Instinct Guides and Insight Grounds: Staged RL Training for LLM Agents. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34906–34914. https://doi.org/10.1609/aaai.v40i41.40794