Toxic Bias: Perspective API Misreads German as More Toxic

Gianluca Nogara; Francesco Pierri; Stefano Cresci; Luca Luceri; Petter Törnberg; Silvia Giordano

doi:10.1609/icwsm.v19i1.35876

Authors

Gianluca Nogara University of Applied Sciences and Arts of Southern Switzerland
Francesco Pierri Politecnico di Milano
Stefano Cresci IIT-CNR
Luca Luceri University of Southern California
Petter Törnberg University of Amsterdam
Silvia Giordano University of Applied Sciences and Arts of Southern Switzerland

DOI:

https://doi.org/10.1609/icwsm.v19i1.35876

Abstract

Proprietary public APIs play a crucial and growing role as research tools among social scientists. Among such APIs, Google's machine learning-based Perspective API is extensively utilized for assessing the toxicity of social media messages, providing both an important resource for researchers and automatic content moderation. However, this paper exposes an important bias in Perspective API concerning German language text. Through an in-depth examination of several datasets, we uncover intrinsic language biases within the multilingual model of Perspective API. We find that the toxicity assessment of German content produces significantly higher toxicity levels than other languages. This finding is robust across various translations, topics, and data sources, and has significant consequences for both research and moderation strategies that rely on Perspective API. For instance, we show that, on average, four times more tweets and users would be moderated when using the German language compared to their English translation. Our findings point to broader risks associated with the widespread use of proprietary APIs within the computational social sciences.

Toxic Bias: Perspective API Misreads German as More Toxic

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information