Everyone’s Voice Matters: Quantifying Annotation Disagreement Using Demographic Information
DOI:
https://doi.org/10.1609/aaai.v37i12.26698Keywords:
GeneralAbstract
In NLP annotation, it is common to have multiple annotators label the text and then obtain the ground truth labels based on major annotators’ agreement. However, annotators are individuals with different backgrounds and various voices. When annotation tasks become subjective, such as detecting politeness, offense, and social norms, annotators’ voices differ and vary. Their diverse voices may represent the true distribution of people’s opinions on subjective matters. Therefore, it is crucial to study the disagreement from annotation to understand which content is controversial from the annotators. In our research, we extract disagreement labels from five subjective datasets, then fine-tune language models to predict annotators’ disagreement. Our results show that knowing annotators’ demographic information (e.g., gender, ethnicity, education level), in addition to the task text, helps predict the disagreement. To investigate the effect of annotators’ demographics on their disagreement level, we simulate different combinations of their artificial demographics and explore the variance of the prediction to distinguish the disagreement from the inherent controversy from text content and the disagreement in the annotators’ perspective. Overall, we propose an innovative disagreement prediction mechanism for better design of the annotation process that will achieve more accurate and inclusive results for NLP systems. Our code and dataset are publicly available.Downloads
Published
2023-06-26
How to Cite
Wan, R., Kim, J., & Kang, D. (2023). Everyone’s Voice Matters: Quantifying Annotation Disagreement Using Demographic Information. Proceedings of the AAAI Conference on Artificial Intelligence, 37(12), 14523-14530. https://doi.org/10.1609/aaai.v37i12.26698
Issue
Section
AAAI Special Track on AI for Social Impact