GMHP7k: A Corpus of German Misogynistic Hatespeech Posts

Authors

  • Jonas Glasebach KPMG AG
  • Max-Emanuel Keller HM Hochschule München University of Applied Sciences
  • Alexander Döschl HM Hochschule München University of Applied Sciences
  • Peter Mandl HM Hochschule München University of Applied Sciences

DOI:

https://doi.org/10.1609/icwsm.v18i1.31438

Abstract

We provide a german corpus consisting of 7,061 posts authored by users of social media platforms. A group of volunteers annotated each post according to hatespeech and misogynistic/misogynous hatespeech in a binary fashion. The interrater reliability over all annotators according to Fleiss’ Kappa is 0.6409 for hatespeech and 0.8258 for misogynistic hatespeech. Furthermore, baseline measurements with machine learning based text classification with BERT are presented. Initial experiments with the corpus achieve macro average F1-scores up to 0.79 for hatespeech and 0.75 for misogynistic hatespeech. The dataset of the corpus on German Misogynistic Hatespeech Posts (GMHP7k) is publicly available.

Downloads

Published

2024-05-28

How to Cite

Glasebach, J., Keller, M.-E., Döschl, A., & Mandl, P. (2024). GMHP7k: A Corpus of German Misogynistic Hatespeech Posts. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 1946-1957. https://doi.org/10.1609/icwsm.v18i1.31438