Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing


  • Alessandro Checco University of Sheffield
  • Kevin Roitero University of Udine
  • Eddy Maddalena University of Southampton
  • Stefano Mizzaro University of Udine
  • Gianluca Demartini University of Queensland




crowdsourcing, inter-rater agreement, reliability


In the context of micro-task crowdsourcing, each task is usually performed by several workers. This allows researchers to leverage measures of the agreement among workers on the same task, to estimate the reliability of collected data and to better understand answering behaviors of the participants. While many measures of agreement between annotators have been proposed, they are known for suffering from many problems and abnormalities. In this paper, we identify the main limits of the existing agreement measures in the crowdsourcing context, both by means of toy examples as well as with real-world crowdsourcing data, and propose a novel agreement measure based on probabilistic parameter estimation which overcomes such limits. We validate our new agreement measure and show its flexibility as compared to the existing agreement measures.




How to Cite

Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., & Demartini, G. (2017). Let’s Agree to Disagree: Fixing Agreement Measures for Crowdsourcing. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 5(1), 11-20. https://doi.org/10.1609/hcomp.v5i1.13306