Ghostwriters of the Marketplace: A Corpus of Machine-Generated Reviews for Google Services and Businesses

Karla Schäfer; Mohammed Mofreh; Martin Steinebach

doi:10.1609/icwsm.v20i1.42791

Authors

Karla Schäfer Fraunhofer Institute for Secure Information Technology, National Research Center for Applied Cybersecurity Technical University of Darmstadt
Mohammed Mofreh Technical University of Darmstadt
Martin Steinebach Fraunhofer Institute for Secure Information Technology, National Research Center for Applied Cybersecurity Technical University of Darmstadt

DOI:

https://doi.org/10.1609/icwsm.v20i1.42791

Abstract

Generative models are simplifying the process of creating fake reviews. Models such as ChatGPT or DeepSeek can be used with ease to generate fake reviews with various ratings. However, these fake reviews can mislead buyers/readers. We generated a dataset with 78k reviews (building our "main" dataset) generated by 6 LLMs using three different writing styles, 117 tones and three ratings (positive, negative and neutral). We used 50 prompts, multi-turn prompting and paraphrasing to generate a diverse range of reviews. Additionally, we created another, smaller dataset (external), using the same prompts and tones as before, but using three additional LLMs. This dataset allows detectors of AI-generated text to be tested on their ability to detect texts created using similar prompts but before unseen LLMs. We performed first tests using the newly generated dataset for AI-generated reviews detection. We fine-tuned RoBERTa on our dataset and, for testing generalizability, on the benchmark RAID using Reddit posts, news, and IMDb reviews. On the test splits of seen data, the detector achieved satisfactory results with an F1-score of 99.98%. On RAID the performance dropped, showing potential future research directions.

Ghostwriters of the Marketplace: A Corpus of Machine-Generated Reviews for Google Services and Businesses

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information