Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG

Authors

  • Bo Li State Key Laboratory of Intelligent Power Distribution Equipment and System, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, China National Engineering Research Center for Software Engineering, Peking University
  • Tian Tian State Key Laboratory of Intelligent Power Distribution Equipment and System, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, China
  • Zhenghua Xu State Key Laboratory of Intelligent Power Distribution Equipment and System, School of Health Sciences and Biomedical Engineering, Hebei University of Technology, China
  • Hao Cheng School of Artificial Intelligence, Hebei University of Technology
  • Shikun Zhang National Engineering Research Center for Software Engineering, Peking University
  • Wei Ye National Engineering Research Center for Software Engineering, Peking University

DOI:

https://doi.org/10.1609/aaai.v40i37.40418

Abstract

Dynamic retrieval-augmented generation (RAG) allows large language models (LLMs) to fetch external knowledge on demand, offering greater adaptability than static RAG. A central challenge in this setting lies in determining the optimal timing for retrieval. Existing methods often trigger retrieval based on low token-level confidence, which may lead to delayed intervention after errors have already propagated. We introduce Entropy-Trend Constraint (ETC), a training-free method that determines optimal retrieval timing by modeling the dynamics of token-level uncertainty. Specifically, ETC utilizes first- and second-order differences of the entropy sequence to detect emerging uncertainty trends, enabling earlier and more precise retrieval. Experiments on six QA benchmarks with three LLM backbones demonstrate that ETC consistently outperforms strong baselines while reducing retrieval frequency. ETC is particularly effective in domain-specific scenarios, exhibiting robust generalization capabilities. Ablation studies and qualitative analyses further confirm that trend-aware uncertainty modeling yields more effective retrieval timing. The method is plug-and-play, model-agnostic, and readily integrable into existing decoding pipelines. Implementation code is included in the supplementary materials.

Published

2026-03-14

How to Cite

Li, B., Tian, T., Xu, Z., Cheng, H., Zhang, S., & Ye, W. (2026). Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG. Proceedings of the AAAI Conference on Artificial Intelligence, 40(37), 31527–31535. https://doi.org/10.1609/aaai.v40i37.40418

Issue

Section

AAAI Technical Track on Natural Language Processing II