MultiFOLD: Multi-source Domain Adaption for Offensive Language Detection

Aymé Arango; Parisa Kaghazgaran; Sheikh Muhammad Sarwar; Vanessa Murdock; Cj Lee

doi:10.1609/icwsm.v18i1.31299

Authors

Aymé Arango Universidad de Chile
Parisa Kaghazgaran Amazon
Sheikh Muhammad Sarwar Amazon
Vanessa Murdock Amazon AWS AI/ML
Cj Lee Amazon AWS AI/ML

DOI:

https://doi.org/10.1609/icwsm.v18i1.31299

Abstract

Automatic offensive language detection remains challenging, and is a crucial part of preserving the openness of digital spaces, which are an integral part of our everyday experi- ence. The ever-growing forms of offensive online content makes traditional supervised approaches harder to scale due to the financial and psychological costs incurred by collect- ing human annotations. In this work, we propose a domain adaptation framework for offensive language detection, Mul- tiFOLD, which learns and adapts from multiple existing data sets (or source domains) to an unlabeled target domain. Under the hood, a curriculum learning algorithm is employed that kicks off learning with the instances most similar to the target domain while gradually expanding to more distant instances. The proposed model is trained with a standard task-specific loss and a domain adversarial objective which aims to min- imize the language distinctions across the multiple sources and the target, allowing the classifier to distinguish offen- siveness rather than domain. Our experiments on six pub- licly available data sets demonstrate the effectiveness of Mul- tiFOLD. Relative improvement in F1 of 0.5% (WOAH) to 29.7% (ICWSM) is found across five out of the six datasets compared to the state-of-the-art domain adaptation baseline BERT-DAA, resulting in an average of 6% relative F1-score gain.

MultiFOLD: Multi-source Domain Adaption for Offensive Language Detection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information