Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation

Terrence Neumann; Maria De-Arteaga; Sina Fazelpour

doi:10.1609/aaai.v40i46.41254

Authors

Terrence Neumann University of Texas at Austin
Maria De-Arteaga Universitat Ramon Llull, ESADE
Sina Fazelpour Northeastern University

DOI:

https://doi.org/10.1609/aaai.v40i46.41254

Abstract

The emergent capabilities of large language models (LLMs) have prompted interest in using them as surrogates for human subjects in opinion surveys. However, prior evaluations of LLM-based opinion simulation have relied heavily on costly, domain-specific survey data, and mixed empirical results leave their reliability in question. To enable cost-effective, early-stage evaluation, we introduce a quality control assessment designed to test the viability of LLM-simulated opinions on Likert-scale tasks without requiring large-scale human data for validation. This assessment comprises two key tests: logical consistency and alignment with stakeholder expectations, offering a low-cost, domain-adaptable validation tool. We apply our quality control assessment to an opinion simulation task relevant to AI-assisted content moderation and fact-checking workflows---a socially impactful use case---and evaluate nine LLMs using a baseline prompt engineering method (backstory prompting), as well as fine-tuning and in-context learning variants. None of the models or methods pass the full assessment, revealing several failure modes. We conclude with a discussion of the risk management implications and release TopicMisinfo, a benchmark dataset with paired human and LLM annotations simulated by various models and approaches, to support future research.

Should You Use LLMs to Simulate Opinions? Quality Checks for Early-Stage Deliberation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information