Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection

Authors

  • Siyuan Li School of Computer Science, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Xi Lin School of Computer Science, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Guangyan Li Institute of Automation, Chinese Academy of Sciences, Beijing, China
  • Zehao Liu School of Computer Science, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Aodu Wulianghai School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
  • Li Ding School of Computer Science, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Jun Wu School of Computer Science, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China
  • Jianhua Li School of Computer Science, Shanghai Jiao Tong University, Shanghai, China Shanghai Key Laboratory of Integrated Administration Technologies for Information Security, Shanghai, China

DOI:

https://doi.org/10.1609/aaai.v40i42.40872

Abstract

The rapid advancement of large language models (LLMs) has resulted in increasingly sophisticated AI-generated content, posing significant challenges in distinguishing LLM-generated text from human-written language. Existing detection methods, primarily based on lexical heuristics or fine-tuned classifiers, often suffer from limited generalizability and are vulnerable to paraphrasing, adversarial perturbations, and cross-domain shifts. In this work, we propose SentiDetect, a model-agnostic framework for detecting LLM-generated text by analyzing the divergence in sentiment distribution stability. Our method is motivated by the empirical observation that LLM outputs tend to exhibit emotionally consistent patterns, whereas human-written texts display greater emotional variability. To capture this phenomenon, we define two complementary metrics: sentiment distribution consistency and sentiment distribution preservation, which quantify stability under sentiment-altering and semantic-preserving transformations. We evaluate SentiDetect on five diverse domains and a range of advanced LLMs, including Gemini-1.5-Pro, Claude-3, GPT-4-0613, and LLaMa-3.3. Experimental results demonstrate its superiority over state-of-the-art baselines, with over 16% and 11% F1 score improvements on Gemini-1.5-Pro and GPT-4-0613, respectively. Moreover, SentiDetect also shows greater robustness to paraphrasing, adversarial attacks, and text length variations, outperforming existing detectors in challenging scenarios.

Downloads

Published

2026-03-14

How to Cite

Li, S., Lin, X., Li, G., Liu, Z., Wulianghai, A., Ding, L., … Li, J. (2026). Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35608–35616. https://doi.org/10.1609/aaai.v40i42.40872

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI