Ethics2vec: Aligning Automatic Agents and Human Preferences

Gianluca Bontempi

doi:10.1609/aaaiss.v7i1.36880

Authors

Gianluca Bontempi Université Libre de Bruxelles, Brussels, Belgium

DOI:

https://doi.org/10.1609/aaaiss.v7i1.36880

Abstract

The interaction of humans and intelligent agents continues to grow and will be inevitable in the near future. Though intelligent agents are supposed to improve human experience (or make it more efficient) it is hard from a human perspective to grasp the ethical values which are explicitly or implicitly embedded in an agent behaviour. This is the well-known problem of alignment, which refers to the challenge of designing AI systems that align with human values, goals, and preferences. This problem is particularly challenging since most human ethical considerations refer to incommensurable (i.e. non-measurable and/or incomparable) values and criteria. Consider, for instance, a medical agent prescribing a treatment to a cancerous patient. How could it take into account (and/or weigh) incommensurable aspects like the value of a human life and the cost of the treatment? Now, the alignment between human and artificial values is possible only if we define a common space where a metric can be defined and used. This paper proposes to extend to ethics the conventional Anything2vec approach, which has been successful in plenty of similar and hard-to-quantify domains (ranging from natural language processing to recommendation systems and graph analysis). This paper proposes a way to map an automatic agent decision-making (or control law) strategy to a multivariate vector representation, which can be used to compare and assess the alignment with human values. The rationale is that if an automatic agent implements a decision-making strategy, this strategy is optimal with respect to some loss function. At the same time, if the human accepts to adhere to the agent strategy, this implicitly means that such agent strategy is also optimal wrt to a weighted sum of human criteria. By making such an assumption, it is possible to recover some constraints on the weights of the human criteria that the adoption of the agent strategy implies. The Ethics2Vec method is first introduced in the case of an automatic agent performing binary decision-making. Then, a vectorisation of an automatic control law (like in the case of a self-driving car) is discussed to show how the approach can be extended to automatic control settings.

Ethics2vec: Aligning Automatic Agents and Human Preferences

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information