Too Sure for Our Own Good: A User Study on AI Confidence and Human Reliance

Caterina Fregosi; Lucia Vicente; Andrea Campagner; Federico Cabitza

doi:10.1609/aaai.v40i21.38798

Authors

Caterina Fregosi University of Milano-Bicocca
Lucia Vicente University of Milano-Bicocca
Andrea Campagner University of Milano-Bicocca IRCCS Ospedale Galeazzi–Sant’Ambrogio
Federico Cabitza University of Milano-Bicocca IRCCS Ospedale Galeazzi–Sant’Ambrogio

DOI:

https://doi.org/10.1609/aaai.v40i21.38798

Abstract

Achieving appropriate human reliance on Artificial Intelligence (AI) systems remains a central challenge in Human-Computer Interaction. Confidence scores—indicators of an AI system’s certainty in its recommendations—have been proposed as a means to help users calibrate their trust and reliance on AI Decision Support Systems (DSS). However, limited research has explored how well-calibrated versus miscalibrated confidence scores affect human decision-making. We report a study examining the effects of confidence calibration on user reliance, decision accuracy, and perceived utility of an AI DSS. In a within-subjects experiment involving 184 participants solving logic puzzles, we found that well-calibrated confidence scores significantly improved decision accuracy (+20%, 95% CI: [0.18, 0.23]), whereas miscalibrated scores yielded minimal accuracy gains (+2%, 95% CI: [-0.00, 0.04]) and increased vulnerability to automation bias and conservatism bias. Participants were more likely to accept AI recommendations when high confidence was expressed, even when those recommendations were incorrect, resulting in errors. Conversely, miscalibrated and low-confidence recommendations increased conservatism bias, leading users to reject even accurate AI suggestions. Perceived utility of the AI system was higher when confidence levels were high (p < 0.001) and when confidence was well-calibrated (p = 0.002). These findings underscore the importance of designing AI systems with properly calibrated confidence cues to improve human-AI collaboration and mitigate reliance-related biases.

Too Sure for Our Own Good: A User Study on AI Confidence and Human Reliance

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information