Confidence Calibration in Large Language Models for Uncertainty Quantification: Affecting Calibration with Conditional Weight Updates
DOI:
https://doi.org/10.1609/aaaiss.v7i1.36937Abstract
In any medical applications of Large Language Models (LLMs), it is critical to have accurate uncertainty quantification, as well as control over the over- and under-confidence of the model. Current fine-tuning (FT) methods lack this control, partly because they fail to account for the fact that repeated exposure to a fact does not make it more correct. We propose a revised FT method that updates model weights only when the model does not sufficiently “know” an answer. We fine-tuned Meta's Llama-3.2, 1B parameter model on the MMLU multiple-choice dataset using traditional FT methods for a Control Model and Conditional Update FT for an Experi-mental Model. The tuned models showed different results, with the Control showing greater overconfidence and the Experimental Model showing greater under-confidence as compared to the Base Model. Additionally, the Experimental Model showed a more even distribution of confidence scores, which is advantageous for post-calibration. This method for affecting confidence calibration while fi-ne-tuning LLMs may potentially help in the broader challenge of creating reliable and trustworthy LLMs.Downloads
Published
2025-11-23
How to Cite
Somers, S., & Kim, E. (2025). Confidence Calibration in Large Language Models for
Uncertainty Quantification: Affecting Calibration with
Conditional Weight Updates. Proceedings of the AAAI Symposium Series, 7(1), 590-593. https://doi.org/10.1609/aaaiss.v7i1.36937
Issue
Section
Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health (SECURE-AI4H)