Ghostwriters of the Marketplace: A Corpus of Machine-Generated Reviews for Google Services and Businesses

Authors

  • Karla Schäfer Fraunhofer Institute for Secure Information Technology, National Research Center for Applied Cybersecurity Technical University of Darmstadt
  • Mohammed Mofreh Technical University of Darmstadt
  • Martin Steinebach Fraunhofer Institute for Secure Information Technology, National Research Center for Applied Cybersecurity Technical University of Darmstadt

DOI:

https://doi.org/10.1609/icwsm.v20i1.42791

Abstract

Generative models are simplifying the process of creating fake reviews. Models such as ChatGPT or DeepSeek can be used with ease to generate fake reviews with various ratings. However, these fake reviews can mislead buyers/readers. We generated a dataset with 78k reviews (building our "main" dataset) generated by 6 LLMs using three different writing styles, 117 tones and three ratings (positive, negative and neutral). We used 50 prompts, multi-turn prompting and paraphrasing to generate a diverse range of reviews. Additionally, we created another, smaller dataset (external), using the same prompts and tones as before, but using three additional LLMs. This dataset allows detectors of AI-generated text to be tested on their ability to detect texts created using similar prompts but before unseen LLMs. We performed first tests using the newly generated dataset for AI-generated reviews detection. We fine-tuned RoBERTa on our dataset and, for testing generalizability, on the benchmark RAID using Reddit posts, news, and IMDb reviews. On the test splits of seen data, the detector achieved satisfactory results with an F1-score of 99.98%. On RAID the performance dropped, showing potential future research directions.

Downloads

Published

2026-05-25

How to Cite

Schäfer, K., Mofreh, M., & Steinebach, M. (2026). Ghostwriters of the Marketplace: A Corpus of Machine-Generated Reviews for Google Services and Businesses. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 2910–2920. https://doi.org/10.1609/icwsm.v20i1.42791