Too Sure for Our Own Good: A User Study on AI Confidence and Human Reliance

Authors

  • Caterina Fregosi University of Milano-Bicocca
  • Lucia Vicente University of Milano-Bicocca
  • Andrea Campagner University of Milano-Bicocca IRCCS Ospedale Galeazzi–Sant’Ambrogio
  • Federico Cabitza University of Milano-Bicocca IRCCS Ospedale Galeazzi–Sant’Ambrogio

DOI:

https://doi.org/10.1609/aaai.v40i21.38798

Abstract

Achieving appropriate human reliance on Artificial Intelligence (AI) systems remains a central challenge in Human-Computer Interaction. Confidence scores—indicators of an AI system’s certainty in its recommendations—have been proposed as a means to help users calibrate their trust and reliance on AI Decision Support Systems (DSS). However, limited research has explored how well-calibrated versus miscalibrated confidence scores affect human decision-making. We report a study examining the effects of confidence calibration on user reliance, decision accuracy, and perceived utility of an AI DSS. In a within-subjects experiment involving 184 participants solving logic puzzles, we found that well-calibrated confidence scores significantly improved decision accuracy (+20%, 95% CI: [0.18, 0.23]), whereas miscalibrated scores yielded minimal accuracy gains (+2%, 95% CI: [-0.00, 0.04]) and increased vulnerability to automation bias and conservatism bias. Participants were more likely to accept AI recommendations when high confidence was expressed, even when those recommendations were incorrect, resulting in errors. Conversely, miscalibrated and low-confidence recommendations increased conservatism bias, leading users to reject even accurate AI suggestions. Perceived utility of the AI system was higher when confidence levels were high (p < 0.001) and when confidence was well-calibrated (p = 0.002). These findings underscore the importance of designing AI systems with properly calibrated confidence cues to improve human-AI collaboration and mitigate reliance-related biases.

Published

2026-03-14

How to Cite

Fregosi, C., Vicente, L., Campagner, A., & Cabitza, F. (2026). Too Sure for Our Own Good: A User Study on AI Confidence and Human Reliance. Proceedings of the AAAI Conference on Artificial Intelligence, 40(21), 17445–17453. https://doi.org/10.1609/aaai.v40i21.38798

Issue

Section

AAAI Technical Track on Humans and AI