Global-Local Confidence Fusion for Hallucination Detection in Mathematical Reasoning Task

Authors

  • Bo Zhang PLA Rocket Force University of Engineering; Center of Information Research, PLA Academy of Military Science
  • Cong Gao College of Cryptology and Cyber Science, Nankai University
  • Linkang Yang School of Electronics and Information, Xi’an Jiaotong University
  • Bingxu Han School of Mathematics, Shandong University
  • Minghao Hu Center of Information Research, PLA Academy of Military Science
  • Zhunchen Luo Center of Information Research, PLA Academy of Military Science
  • Guotong Geng Center of Information Research, PLA Academy of Military Science
  • Xiaoying Bai Center of Information Research, PLA Academy of Military Science
  • Jun Zhang Center of Information Research, PLA Academy of Military Science; Defense Innovation Institute, PLA Academy of Military Science
  • Wen Yao Defense Innovation Institute, PLA Academy of Military Science
  • Zhong Wang PLA Rocket Force University of Engineering

DOI:

https://doi.org/10.1609/aaai.v40i41.40762

Abstract

Large Reasoning Models (LRMs) achieve promising results on complex reasoning tasks but remain susceptible to hallucinations. Existing hallucination detection methods based on Large Language Models (LLMs) often focus solely on final answers, overlooking inconsistencies between the answer and reasoning process. This limitation reduces their ability to detect hallucinations during inference. Moreover, training-free approaches lack mechanisms for confidence estimation, resulting in an unquantified detection output. In contrast, training-based methods can provide fine-grained assessments but often neglect the self-correction capability of LRMs, where earlier errors may be corrected in subsequent steps, leading to inaccurate hallucination detection. To address these challenges, we propose ConfFuse, a unified framework that fuses global and local confidence scores for hallucination detection. A Global Hallucination Detection Model (GHDM) is trained using Direct Preference Optimization (DPO) to assess hallucinations at the level of entire reasoning chains, yielding global confidence estimates. Simultaneously, a Process Reward Model (PRM) estimates step-wise confidence scores to capture local logical flaws. A weighted fusion strategy combines the global confidence score with the minimum local score to jointly reflect overall reasoning consistency and local soundness. Experimental evaluations demonstrate that ConfFuse surpasses Qwen3-1.7B and Qwen3-8B by up to 11.86% and 5.46% in F1 score on in-distribution datasets, and achieves average improvements of 4.65% and 2.80% on out-of-distribution datasets. These results verify the effectiveness and generalizability of the proposed framework.

Published

2026-03-14

How to Cite

Zhang, B., Gao, C., Yang, L., Han, B., Hu, M., Luo, Z., … Wang, Z. (2026). Global-Local Confidence Fusion for Hallucination Detection in Mathematical Reasoning Task. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34620–34628. https://doi.org/10.1609/aaai.v40i41.40762

Issue

Section

AAAI Technical Track on Natural Language Processing VI