Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency

Authors

  • Jan Batzner Weizenbaum Institute Columbia University Technical University Munich
  • Volker Stocker Weizenbaum Institute Technical University Berlin
  • Bingjun Tang Columbia University
  • Anusha Natarajan Columbia University
  • Qinhao Chen Columbia University
  • Stefan Schmid Weizenbaum Institute Technical University Berlin
  • Gjergji Kasneci Technical University Munich

DOI:

https://doi.org/10.1609/aies.v8i1.36553

Abstract

Synthetic personae experiments have become a prominent method in Large Language Model alignment research, yet the representativeness and ecological validity of these personae vary considerably between studies. Through a review of 63 peer-reviewed studies published between 2023 and 2025 in leading NLP and AI venues, we reveal a critical gap: task and population of interest are often underspecified in persona-based experiments, despite personalization being fundamentally dependent on these criteria. Our analysis shows substantial differences in user representation, with most studies focusing on limited sociodemographic attributes and only 35% discussing the representativeness of their LLM personae. Based on our findings, we introduce a persona transparency checklist that emphasizes representative sampling, explicit grounding in empirical data, and enhanced ecological validity. Our work provides both a comprehensive assessment of current practices and practical guidelines to improve the rigor and ecological validity of persona-based evaluations in language model alignment research.

Downloads

Published

2025-10-15

How to Cite

Batzner, J., Stocker, V., Tang, B., Natarajan, A., Chen, Q., Schmid, S., & Kasneci, G. (2025). Whose Personae? Synthetic Persona Experiments in LLM Research and Pathways to Transparency. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(1), 343-354. https://doi.org/10.1609/aies.v8i1.36553