Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing

Alessandro Checco; Kevin Roitero; Eddy Maddalena; Stefano Mizzaro; Gianluca Demartini

doi:10.1609/hcomp.v5i1.13306

Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing

Authors

Alessandro Checco University of Sheffield
Kevin Roitero University of Udine
Eddy Maddalena University of Southampton
Stefano Mizzaro University of Udine
Gianluca Demartini University of Queensland

DOI:

https://doi.org/10.1609/hcomp.v5i1.13306

Keywords:

crowdsourcing, inter-rater agreement, reliability

Abstract

In the context of micro-task crowdsourcing, each task is usually performed by several workers. This allows researchers to leverage measures of the agreement among workers on the same task, to estimate the reliability of collected data and to better understand answering behaviors of the participants. While many measures of agreement between annotators have been proposed, they are known for suffering from many problems and abnormalities. In this paper, we identify the main limits of the existing agreement measures in the crowdsourcing context, both by means of toy examples as well as with real-world crowdsourcing data, and propose a novel agreement measure based on probabilistic parameter estimation which overcomes such limits. We validate our new agreement measure and show its flexibility as compared to the existing agreement measures.

Downloads

Published

2017-09-21

How to Cite

Checco, A., Roitero, K., Maddalena, E., Mizzaro, S., & Demartini, G. (2017). Let’s Agree to Disagree: Fixing Agreement Measures for Crowdsourcing. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 5(1), 11-20. https://doi.org/10.1609/hcomp.v5i1.13306

Download Citation

Issue

Vol. 5 (2017): Fifth AAAI Conference on Human Computation and Crowdsourcing

Section

Full Papers

Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information