ME: Modelling Ethical Values for Value Alignment

Authors

  • Eryn Rigley University of Southampton, United Kingdom
  • Adriane Chapman University of Southampton, United Kingdom
  • Christine Evers University of Southampton, United Kingdom
  • Will McNeill University of Southampton, United Kingdom

DOI:

https://doi.org/10.1609/aaai.v39i26.34974

Abstract

Value alignment, at the intersection of moral philosophy and AI safety, is dedicated to ensuring that artificially intelligent (AI) systems align with a certain set of values. One challenge facing value alignment researchers is accurately translating these values into a machine readable format. In the case of reinforcement learning (RL), a popular method within value alignment, this requires designing a reward function which accurately defines the value of all state-action pairs. It is common for programmers to hand-set and manually tune these values. In this paper, we examine the challenges of hand-programming values into reward functions for value alignment, and propose mathematical models as an alternative grounding for reward function design in ethical scenarios. Experimental results demonstrate that our modelled-ethics approach offers a more consistent alternative and outperforms our hand-programmed reward functions.

Downloads

Published

2025-04-11

How to Cite

Rigley, E., Chapman, A., Evers, C., & McNeill, W. (2025). ME: Modelling Ethical Values for Value Alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 39(26), 27608–27616. https://doi.org/10.1609/aaai.v39i26.34974

Issue

Section

AAAI Technical Track on AI Alignment