Contractual AI: Toward More Aligned, Transparent, and Robust Dialogue Agents


  • Christopher J. Bates Harvard University, Dept. of Psychology
  • Ritwik Bose Institute for Human and Machine Cognition Knox College
  • Reagan G. Keeney Institute for Human and Machine Cognition Knox College
  • Vera A. Kazakova Institute for Human and Machine Cognition Knox College



Safety, Large Language Models, Ethical Decision Making, Chatbots


We present a new framework for AI alignment called Contractual AI, and apply it to the setting of dialogue agents chatting with humans. This framework incorporates and builds on previous approaches to alignment, such as Constitutional AI. We propose that fully aligned systems may need both a "think fast" and a "think slow" systems for approximating complex human judgements. Fast thinking (System 1) is computationally cheap but rigid and brittle in novel situations, while slow thinking (System 2) is more expensive but more flexible and robust. System 1 makes judgements by asking whether a rule or principle is violated. System 2 does the explicit reasoning that produces the rules, explicitly tallying costs and benefits for all stakeholders. Rule-based systems like Constitutional AI correspond roughly to System 1. Here, we implement a prototype of System 2, and lay out a road-map for enabling the system to make more thorough and accurate considerations for all stakeholder groups, including those underrepresented in the training data (e.g. racial minorities). For initial testing, we guided the decision process through the steps of: 1) identifying all stakeholders, 2) listing their individual concerns, 3) soliciting the projected opinions of various experts, and 4) combining the expert opinions into a final moral judgement. The resulting text was less generic, more aware of complex stakeholder needs, and ultimately more actionable.






Assured and Trustworthy Human-centered AI (ATHAI)