HEV Generative Sandbox: A Framework for Assessing Domain-Specific Social Risks Through Human-LLM Simulation

Authors

  • Yiran Liu Tsinghua University, Tsinghua University
  • Zhiyi Hou Westlake University Zhejiang University Shanghai Innovation Institute
  • Xiaoang Xu Beijing University of Posts and Telecommunications
  • Shuo Wang Tsinghua University, Tsinghua University
  • Huijia Wu Beijing University of Posts and Telecommunications
  • Kaicheng Yu Westlake University
  • Yang Yu China University of Petroleum (Beijing)
  • ChengXiang Zhai University of Illinois, Urbana Champaign

DOI:

https://doi.org/10.1609/aaai.v40i38.40498

Abstract

Deploying Large Language Models (LLMs) in specialized domains introduces significant societal and compliance risks, including bias amplification, misinformation propagation, and privacy violations. These risks predominantly emerge from the dynamic interactions between LLMs and humans in specific contexts. Different domains face unique distribution of hazards, and varying interaction modalities introduce distinct levels of exposure and vulnerability. However, current risk assessment frameworks lack a systematic methodology to capture this dynamic interplay. In this work, we introduce the HEV Generative Sandbox, a novel risk evaluation framework that simulates human-LLM behavior to quantify domain-contextual risks across three interdependent dimensions: 1) Hazard (H): Domain-specific threats inherent to a given context; 2) Exposure (E): The extent to which the LLM and its users are subjected to hazardous scenarios; 3) Vulnerability (V): The susceptibility of the system to risk due to human interaction or model weaknesses. Our approach pioneers "domain-rooted scenario generation", wherein we sample contextual distributions from domain-specific corpora and simulate diverse inputs. By unifying dynamic scenario simulation, causal risk decomposition, and closed-loop evaluation, the HEV Generative Sandbox provides a scalable, domain-sensitive methodology for responsible LLM deployment. This work contributes to advancing the safe deployment of LLMs by providing a comprehensive and automated risk evaluation framework.

Downloads

Published

2026-03-14

How to Cite

Liu, Y., Hou, Z., Xu, X., Wang, S., Wu, H., Yu, K., … Zhai, C. (2026). HEV Generative Sandbox: A Framework for Assessing Domain-Specific Social Risks Through Human-LLM Simulation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32249–32257. https://doi.org/10.1609/aaai.v40i38.40498

Issue

Section

AAAI Technical Track on Natural Language Processing III