Global-Local Confidence Fusion for Hallucination Detection in Mathematical Reasoning Task

Bo Zhang; Cong Gao; Linkang Yang; Bingxu Han; Minghao Hu; Zhunchen Luo; Guotong Geng; Xiaoying Bai; Jun Zhang; Wen Yao; Zhong Wang

doi:10.1609/aaai.v40i41.40762

Authors

Bo Zhang PLA Rocket Force University of Engineering； Center of Information Research, PLA Academy of Military Science
Cong Gao College of Cryptology and Cyber Science, Nankai University
Linkang Yang School of Electronics and Information, Xi’an Jiaotong University
Bingxu Han School of Mathematics, Shandong University
Minghao Hu Center of Information Research, PLA Academy of Military Science
Zhunchen Luo Center of Information Research, PLA Academy of Military Science
Guotong Geng Center of Information Research, PLA Academy of Military Science
Xiaoying Bai Center of Information Research, PLA Academy of Military Science
Jun Zhang Center of Information Research, PLA Academy of Military Science； Defense Innovation Institute, PLA Academy of Military Science
Wen Yao Defense Innovation Institute, PLA Academy of Military Science
Zhong Wang PLA Rocket Force University of Engineering

DOI:

https://doi.org/10.1609/aaai.v40i41.40762

Abstract

Large Reasoning Models (LRMs) achieve promising results on complex reasoning tasks but remain susceptible to hallucinations. Existing hallucination detection methods based on Large Language Models (LLMs) often focus solely on final answers, overlooking inconsistencies between the answer and reasoning process. This limitation reduces their ability to detect hallucinations during inference. Moreover, training-free approaches lack mechanisms for confidence estimation, resulting in an unquantified detection output. In contrast, training-based methods can provide fine-grained assessments but often neglect the self-correction capability of LRMs, where earlier errors may be corrected in subsequent steps, leading to inaccurate hallucination detection. To address these challenges, we propose ConfFuse, a unified framework that fuses global and local confidence scores for hallucination detection. A Global Hallucination Detection Model (GHDM) is trained using Direct Preference Optimization (DPO) to assess hallucinations at the level of entire reasoning chains, yielding global confidence estimates. Simultaneously, a Process Reward Model (PRM) estimates step-wise confidence scores to capture local logical flaws. A weighted fusion strategy combines the global confidence score with the minimum local score to jointly reflect overall reasoning consistency and local soundness. Experimental evaluations demonstrate that ConfFuse surpasses Qwen3-1.7B and Qwen3-8B by up to 11.86% and 5.46% in F1 score on in-distribution datasets, and achieves average improvements of 4.65% and 2.80% on out-of-distribution datasets. These results verify the effectiveness and generalizability of the proposed framework.

Global-Local Confidence Fusion for Hallucination Detection in Mathematical Reasoning Task

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information