Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI

Authors

  • Malek Mechergui Colorado State University
  • Sarath Sreedharan Colorado State University

DOI:

https://doi.org/10.1609/aaai.v38i9.28875

Keywords:

HAI: Human-Aware Planning and Behavior Prediction, HAI: Interaction Techniques and Devices, HAI: Learning Human Values and Preferences, PEAI: Safety, Robustness & Trustworthiness, PRS: Activity and Plan Recognition

Abstract

While the question of misspecified objectives has gotten much attention in recent years, most works in this area primarily focus on the challenges related to the complexity of the objective specification mechanism (for example, the use of reward functions). However, the complexity of the objective specification mechanism is just one of many reasons why the user may have misspecified their objective. A foundational cause for misspecification that is being overlooked by these works is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective. To address this, we propose a novel formulation for the objective misspecification problem that builds on the human-aware planning literature, which was originally introduced to support explanation and explicable behavioral generation. Additionally, we propose a first-of-its-kind interactive algorithm that is capable of using information generated under incorrect beliefs about the agent to determine the true underlying goal of the user.

Published

2024-03-24

How to Cite

Mechergui, M., & Sreedharan, S. (2024). Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI. Proceedings of the AAAI Conference on Artificial Intelligence, 38(9), 10110-10118. https://doi.org/10.1609/aaai.v38i9.28875

Issue

Section

AAAI Technical Track on Humans and AI