CP-Search: A Chain Progressive Search Training Framework Incentivizing the Cognitive Behaviors for Searching in LLMs
DOI:
https://doi.org/10.1609/aaai.v40i40.40666Abstract
Retrieval-Augmented Generation (RAG) has been demonstrated to effectively mitigate the knowledge recency issue in Large Language Models (LLMs) while significantly reducing hallucinations. However, existing RAG methods exhibit insufficient capability in modeling reasoning paths for complex multi-hop reasoning tasks. While Reinforcement Learning (RL) has demonstrated success in enhancing model reasoning ability, Token-level RL frameworks exhibit inherent limitations in maintaining coherent reasoning trajectories. This approach remains susceptible to the compounding accumulation of contextual errors during the retrieval process, ultimately resulting in erroneous output generation. To address this challenge, we propose Chain Progressive Search (CP-Search), a novel two-stage training framework designed to enhance the model's retrieval capability in complex scenarios. This framework models the entire retrieval process as a Retrieval-level Markov Decision Process, systematically optimizing the model's retrieval behavior at each step of the chained retrieval. Specifically, CP-Search first constructs a retrieval-cognitive behavioral dataset and employs Supervised Fine-Tuning (SFT) to endow the model with cognitive behaviors for searching. More importantly, by introducing a dense progressive procedural reward in reinforcement learning training, CP-Search significantly improves the model's reasoning consistency and feedback correction ability in chained retrieval. Experiments conducted on multiple multi-hop datasets demonstrate that CP-Search significantly outperforms existing RAG methods in complex multi-hop reasoning tasks.Published
2026-03-14
How to Cite
Wang, Z., Li, S., & Tang, B. (2026). CP-Search: A Chain Progressive Search Training Framework Incentivizing the Cognitive Behaviors for Searching in LLMs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 33755–33763. https://doi.org/10.1609/aaai.v40i40.40666
Issue
Section
AAAI Technical Track on Natural Language Processing V