Towards Robust Human–AI Decision-Making via Learning-to-Defer

Authors

  • Yannis Montreuil National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v40i48.42160

Abstract

AI systems often fail on challenging or out-of-distribution inputs—a critical limitation in domains such as healthcare, finance, and autonomous driving. Learning to Defer (L2D) addresses this by training models not only to predict but also to decide when to defer to external experts. This thesis develops a unified and robust framework for L2D that advances its theoretical foundations, reliability, and applicability. It characterizes Bayes-optimal routing policies, establishes surrogate-consistency guarantees, and introduces a unified adversarial framework for attacking and defending L2D with Bayes-optimal robustness. It further proposes the first top-k deferral methods in both two-stage and one-stage settings. Empirical studies validate these ideas in multi-task learning and extractive question answering with large language models. Ongoing work explores token-level routing in LLMs, online adaptation with dynamic experts, and partial deferral.

Downloads

Published

2026-03-14

How to Cite

Montreuil, Y. (2026). Towards Robust Human–AI Decision-Making via Learning-to-Defer. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41068–41069. https://doi.org/10.1609/aaai.v40i48.42160