Learning When Not to Measure: Theorizing Ethical Alignment in LLMs
DOI:
https://doi.org/10.1609/aies.v7i1.31716Abstract
LLMs and other forms of generative AI have shown immense promise in producing highly accurate epistemic judgements in domains as varied as law, education, and medicine – with GPT notably passing the legal Bar exam and various medical licensing exams. The safe extension of LLMs into safety-critical professional domains requires assurance not only of epistemic but ethical alignment. This paper adopts a theoretical and philosophical approach, drawing from metaethical theories to argue for a distinction hinging around quantitative, axiological comparability that separates Kantian ethics from not only the utilitarianism it is well-known to oppose, but from just distribution theories as well, which are key to debiasing LLM models. It presents the novel hypothesis that LLM ethical acquisition from both corpus induction and RLHF may encounter value conflicts between Kantian and just distribution principles that intensify as they come into improved alignment with both theories, hinging around the variability by which self-attention may statistically attend to the same characterizations as more person-like or more resource-like under distinct prompting strategies.Downloads
Published
2024-10-16
How to Cite
Rathje, W. (2024). Learning When Not to Measure: Theorizing Ethical Alignment in LLMs. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society, 7(1), 1190–1199. https://doi.org/10.1609/aies.v7i1.31716
Issue
Section
Full Archival Papers