Towards Robust Human–AI Decision-Making via Learning-to-Defer
DOI:
https://doi.org/10.1609/aaai.v40i48.42160Abstract
AI systems often fail on challenging or out-of-distribution inputs—a critical limitation in domains such as healthcare, finance, and autonomous driving. Learning to Defer (L2D) addresses this by training models not only to predict but also to decide when to defer to external experts. This thesis develops a unified and robust framework for L2D that advances its theoretical foundations, reliability, and applicability. It characterizes Bayes-optimal routing policies, establishes surrogate-consistency guarantees, and introduces a unified adversarial framework for attacking and defending L2D with Bayes-optimal robustness. It further proposes the first top-k deferral methods in both two-stage and one-stage settings. Empirical studies validate these ideas in multi-task learning and extractive question answering with large language models. Ongoing work explores token-level routing in LLMs, online adaptation with dynamic experts, and partial deferral.Downloads
Published
2026-03-14
How to Cite
Montreuil, Y. (2026). Towards Robust Human–AI Decision-Making via Learning-to-Defer. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41068–41069. https://doi.org/10.1609/aaai.v40i48.42160
Issue
Section
AAAI Doctoral Consortium Track