Nothing Comes Without Its World – Practical Challenges of Aligning LLMs to Situated Human Values through RLHF

Authors

  • Anne Arzberger Delft University of Technology
  • Stefan Buijsman Delft University of Technology
  • Maria Luce Lupetti Politechnic University of Turin
  • Alessandro Bozzon Delft University of Technology
  • Jie Yang Delft University of Technology

DOI:

https://doi.org/10.1609/aies.v7i1.31617

Abstract

Work on value alignment aims to ensure that human values are respected by AI systems. However, existing approaches tend to rely on universal framings of human values that obscure the question of which values the systems should capture and align with, given the variety of operational situations. This often results in AI systems that privilege only a selected few while perpetuating problematic norms grounded on biases, ultimately causing equity and justice issues. In this perspective paper, we unpack the limitations of predominant alignment practices of reinforcement learning from human feedback (RLHF) for LLMs through the lens of situated values. We build on feminist epistemology to argue that at the design-time, RLHF has problems with representation in the subjects providing feedback and implicitness in the conceptualization of values and situations of real-world users while lacking system adaptation to real user situations at the use time. To address these shortcomings, we propose three research directions: 1) situated annotation to capture information about the crowdworker’s and user’s values and judgments in relation to specific situations at both the design and use-time, 2) expressive instruction to encode plural values for instructing LLMs systems at design-time, and 3) reflexive adaptation to leverage situational knowledge for system adaption at use-time. We conclude by reflecting on the practical challenges of pursuing these research directions and situated value alignment of AI more broadly.

Downloads

Published

2024-10-16

How to Cite

Arzberger, A., Buijsman, S., Lupetti, M. L., Bozzon, A., & Yang, J. (2024). Nothing Comes Without Its World – Practical Challenges of Aligning LLMs to Situated Human Values through RLHF. Proceedings of the AAAI ACM Conference on AI, Ethics, and Society, 7(1), 61–73. https://doi.org/10.1609/aies.v7i1.31617