GMHP7k: A Corpus of German Misogynistic Hatespeech Posts
DOI:
https://doi.org/10.1609/icwsm.v18i1.31438Abstract
We provide a german corpus consisting of 7,061 posts authored by users of social media platforms. A group of volunteers annotated each post according to hatespeech and misogynistic/misogynous hatespeech in a binary fashion. The interrater reliability over all annotators according to Fleiss’ Kappa is 0.6409 for hatespeech and 0.8258 for misogynistic hatespeech. Furthermore, baseline measurements with machine learning based text classification with BERT are presented. Initial experiments with the corpus achieve macro average F1-scores up to 0.79 for hatespeech and 0.75 for misogynistic hatespeech. The dataset of the corpus on German Misogynistic Hatespeech Posts (GMHP7k) is publicly available.Downloads
Published
2024-05-28
How to Cite
Glasebach, J., Keller, M.-E., Döschl, A., & Mandl, P. (2024). GMHP7k: A Corpus of German Misogynistic Hatespeech Posts. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 1946-1957. https://doi.org/10.1609/icwsm.v18i1.31438
Issue
Section
Dataset Papers