Learning When Not to Measure: Theorizing Ethical Alignment in LLMs

William Rathje

doi:10.1609/aies.v7i1.31716

Authors

William Rathje University of California, Berkeley

DOI:

https://doi.org/10.1609/aies.v7i1.31716

Abstract

LLMs and other forms of generative AI have shown immense promise in producing highly accurate epistemic judgements in domains as varied as law, education, and medicine – with GPT notably passing the legal Bar exam and various medical licensing exams. The safe extension of LLMs into safety-critical professional domains requires assurance not only of epistemic but ethical alignment. This paper adopts a theoretical and philosophical approach, drawing from metaethical theories to argue for a distinction hinging around quantitative, axiological comparability that separates Kantian ethics from not only the utilitarianism it is well-known to oppose, but from just distribution theories as well, which are key to debiasing LLM models. It presents the novel hypothesis that LLM ethical acquisition from both corpus induction and RLHF may encounter value conflicts between Kantian and just distribution principles that intensify as they come into improved alignment with both theories, hinging around the variability by which self-attention may statistically attend to the same characterizations as more person-like or more resource-like under distinct prompting strategies.

Learning When Not to Measure: Theorizing Ethical Alignment in LLMs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section