MultiFOLD: Multi-source Domain Adaption for Offensive Language Detection
DOI:
https://doi.org/10.1609/icwsm.v18i1.31299Abstract
Automatic offensive language detection remains challenging, and is a crucial part of preserving the openness of digital spaces, which are an integral part of our everyday experi- ence. The ever-growing forms of offensive online content makes traditional supervised approaches harder to scale due to the financial and psychological costs incurred by collect- ing human annotations. In this work, we propose a domain adaptation framework for offensive language detection, Mul- tiFOLD, which learns and adapts from multiple existing data sets (or source domains) to an unlabeled target domain. Under the hood, a curriculum learning algorithm is employed that kicks off learning with the instances most similar to the target domain while gradually expanding to more distant instances. The proposed model is trained with a standard task-specific loss and a domain adversarial objective which aims to min- imize the language distinctions across the multiple sources and the target, allowing the classifier to distinguish offen- siveness rather than domain. Our experiments on six pub- licly available data sets demonstrate the effectiveness of Mul- tiFOLD. Relative improvement in F1 of 0.5% (WOAH) to 29.7% (ICWSM) is found across five out of the six datasets compared to the state-of-the-art domain adaptation baseline BERT-DAA, resulting in an average of 6% relative F1-score gain.Downloads
Published
2024-05-28
How to Cite
Arango, A., Kaghazgaran, P., Sarwar, S. M., Murdock, V., & Lee, C. (2024). MultiFOLD: Multi-source Domain Adaption for Offensive Language Detection. Proceedings of the International AAAI Conference on Web and Social Media, 18(1), 86-99. https://doi.org/10.1609/icwsm.v18i1.31299
Issue
Section
Full Papers