ParsCN: A Persian Dataset for Counter-Narrative Generation to Combat Online Hate Speech

Authors

  • Zahra Safdari Fesaghandis Bilkent University
  • Suman Kalyan Maity Missouri University of Science and Technology

DOI:

https://doi.org/10.1609/icwsm.v20i1.42789

Abstract

Online hate speech threatens online civility, particularly in low-resource and multilingual environments. Counter-narratives offer a promising solution by promoting constructive responses to hate speech. However, automatic counter-narrative generation is hindered by the lack of high-quality data for low-resource languages like Persian. To bridge this gap, we introduce ParsCN, the first and most comprehensive Persian counter-narrative dataset. Consisting of 1,100 hate speech and counter-narrative pairs, it provides fine-grained annotations across six target groups and six countering strategies, tailored to the socio-cultural context of Persian online discourse. We propose a novel, scalable multi-stage framework that integrates culturally-informed human annotation with few-shot LLM-augmented generation, guided by semantic retrieval and rigorous manual curation. This approach enables the creation of diverse, high-quality counter-narratives while significantly reducing annotation costs—establishing a replicable paradigm for other low-resource settings. Comprehensive human and automatic evaluations confirm the quality of the dataset and the effectiveness of the generated responses. Human-written counter-narratives achieved the highest scores for relevance (4.23), Effectiveness (4.21), fluency (4.92), and tone appropriateness (4.79), with GPT-4o and Claude closely following. Automatic evaluations show strong semantic alignment (BERTScore F1 up to 0.709), high lexical diversity, and low toxicity across all sources. Finally, we conduct benchmark evaluations using mBART and PersianMind on a held-out test set. Results reveal that existing models struggle with fluency, cultural nuance, and safety—highlighting the need for Persian-specific resources like ParsCN. Our dataset serves as a foundational benchmark to advance research on Persian counter-narrative generation and foster safer, more inclusive digital spaces.

Downloads

Published

2026-05-25

How to Cite

Safdari Fesaghandis, Z., & Maity, S. K. (2026). ParsCN: A Persian Dataset for Counter-Narrative Generation to Combat Online Hate Speech. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 2878–2894. https://doi.org/10.1609/icwsm.v20i1.42789