ME: Modelling Ethical Values for Value Alignment

Eryn Rigley; Adriane Chapman; Christine Evers; Will McNeill

doi:10.1609/aaai.v39i26.34974

Authors

Eryn Rigley University of Southampton, United Kingdom
Adriane Chapman University of Southampton, United Kingdom
Christine Evers University of Southampton, United Kingdom
Will McNeill University of Southampton, United Kingdom

DOI:

https://doi.org/10.1609/aaai.v39i26.34974

Abstract

Value alignment, at the intersection of moral philosophy and AI safety, is dedicated to ensuring that artificially intelligent (AI) systems align with a certain set of values. One challenge facing value alignment researchers is accurately translating these values into a machine readable format. In the case of reinforcement learning (RL), a popular method within value alignment, this requires designing a reward function which accurately defines the value of all state-action pairs. It is common for programmers to hand-set and manually tune these values. In this paper, we examine the challenges of hand-programming values into reward functions for value alignment, and propose mathematical models as an alternative grounding for reward function design in ethical scenarios. Experimental results demonstrate that our modelled-ethics approach offers a more consistent alternative and outperforms our hand-programmed reward functions.

ME: Modelling Ethical Values for Value Alignment

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information