GMHP7k: A Corpus of German Misogynistic Hatespeech Posts

Jonas Glasebach; Max-Emanuel Keller; Alexander Döschl; Peter Mandl

doi:10.1609/icwsm.v18i1.31438

GMHP7k: A Corpus of German Misogynistic Hatespeech Posts

Authors

Jonas Glasebach KPMG AG
Max-Emanuel Keller HM Hochschule München University of Applied Sciences
Alexander Döschl HM Hochschule München University of Applied Sciences
Peter Mandl HM Hochschule München University of Applied Sciences

DOI:

https://doi.org/10.1609/icwsm.v18i1.31438

Abstract

We provide a german corpus consisting of 7,061 posts authored by users of social media platforms. A group of volunteers annotated each post according to hatespeech and misogynistic/misogynous hatespeech in a binary fashion. The interrater reliability over all annotators according to Fleiss’ Kappa is 0.6409 for hatespeech and 0.8258 for misogynistic hatespeech. Furthermore, baseline measurements with machine learning based text classification with BERT are presented. Initial experiments with the corpus achieve macro average F1-scores up to 0.79 for hatespeech and 0.75 for misogynistic hatespeech. The dataset of the corpus on German Misogynistic Hatespeech Posts (GMHP7k) is publicly available.

Downloads

Published

2024-05-28

How to Cite

Glasebach, J., Keller, M.-E., Döschl, A., & Mandl, P. (2024). GMHP7k: A Corpus of German Misogynistic Hatespeech Posts. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 1946-1957. https://doi.org/10.1609/icwsm.v18i1.31438

Download Citation

Issue

Vol. 18 (2024): Proceedings of the Eighteenth International AAAI Conference on Web and Social Media

Section

Dataset Papers

GMHP7k: A Corpus of German Misogynistic Hatespeech Posts

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information