Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation
DOI:
https://doi.org/10.1609/aaai.v40i46.41254Abstract
The emergent capabilities of large language models (LLMs) have prompted interest in using them as surrogates for human subjects in opinion surveys. However, prior evaluations of LLM-based opinion simulation have relied heavily on costly, domain-specific survey data, and mixed empirical results leave their reliability in question. To enable cost-effective, early-stage evaluation, we introduce a quality control assessment designed to test the viability of LLM-simulated opinions on Likert-scale tasks without requiring large-scale human data for validation. This assessment comprises two key tests: logical consistency and alignment with stakeholder expectations, offering a low-cost, domain-adaptable validation tool. We apply our quality control assessment to an opinion simulation task relevant to AI-assisted content moderation and fact-checking workflows---a socially impactful use case---and evaluate nine LLMs using a baseline prompt engineering method (backstory prompting), as well as fine-tuning and in-context learning variants. None of the models or methods pass the full assessment, revealing several failure modes. We conclude with a discussion of the risk management implications and release TopicMisinfo, a benchmark dataset with paired human and LLM annotations simulated by various models and approaches, to support future research.Downloads
Published
2026-03-14
How to Cite
Neumann, T., De-Arteaga, M., & Fazelpour, S. (2026). Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(46), 39070–39079. https://doi.org/10.1609/aaai.v40i46.41254
Issue
Section
AAAI Special Track on AI for Social Impact II