Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing

Authors

  • Alessandro Checco University of Sheffield
  • Kevin Roitero University of Udine
  • Eddy Maddalena University of Southampton
  • Stefano Mizzaro University of Udine
  • Gianluca Demartini University of Queensland

DOI:

https://doi.org/10.1609/hcomp.v5i1.13306

Keywords:

crowdsourcing, inter-rater agreement, reliability

Abstract

In the context of micro-task crowdsourcing, each task is usually performed by several workers. This allows researchers to leverage measures of the agreement among workers on the same task, to estimate the reliability of collected data and to better understand answering behaviors of the participants. While many measures of agreement between annotators have been proposed, they are known for suffering from many problems and abnormalities. In this paper, we identify the main limits of the existing agreement measures in the crowdsourcing context, both by means of toy examples as well as with real-world crowdsourcing data, and propose a novel agreement measure based on probabilistic parameter estimation which overcomes such limits. We validate our new agreement measure and show its flexibility as compared to the existing agreement measures.

Downloads

Published

2017-09-21

How to Cite

Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., & Demartini, G. (2017). Let’s Agree to Disagree: Fixing Agreement Measures for Crowdsourcing. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 5(1), 11-20. https://doi.org/10.1609/hcomp.v5i1.13306