Calibrating Large Language Models with Sample Consistency

Authors

  • Qing Lyu University of Pennsylvania
  • Kumar Shridhar Swiss Federal Institute of Technology
  • Chaitanya Malaviya University of Pennsylvania
  • Li Zhang Drexel University
  • Yanai Elazar Allen Institute for Artificial Intelligence
  • Niket Tandon Microsoft Research
  • Marianna Apidianaki University of Pennsylvania
  • Mrinmaya Sachan Swiss Federal Institute of Technology
  • Chris Callison-Burch University of Pennsylvania

DOI:

https://doi.org/10.1609/aaai.v39i18.34120

Abstract

Accurately gauging the confidence level of Large Language Models' (LLMs) predictions is pivotal for their reliable application. However, LLMs are often uncalibrated inherently and elude conventional calibration techniques due to their proprietary nature and massive scale. In this work, we derive model confidence from the distribution of multiple randomly sampled generations, using three measures of consistency. We extensively evaluate eleven open and closed-source models on nine reasoning datasets. Results show that consistency-based calibration methods outperform existing post-hoc approaches in terms of calibration error. Meanwhile, we find that factors such as intermediate explanations, model scaling, and larger sample sizes enhance calibration, while instruction-tuning makes calibration more difficult. Moreover, confidence scores obtained from consistency can potentially enhance model performance. Finally, we offer guidance on choosing suitable consistency metrics for calibration, tailored to model characteristics such as the exposure to instruction-tuning and RLHF.

Published

2025-04-11

How to Cite

Lyu, Q., Shridhar, K., Malaviya, C., Zhang, L., Elazar, Y., Tandon, N., … Callison-Burch, C. (2025). Calibrating Large Language Models with Sample Consistency. Proceedings of the AAAI Conference on Artificial Intelligence, 39(18), 19260–19268. https://doi.org/10.1609/aaai.v39i18.34120

Issue

Section

AAAI Technical Track on Machine Learning IV