SQUARE: A Benchmark for Research on Computing Crowd Consensus

Aashish Sheshadri; Matthew Lease

doi:10.1609/hcomp.v1i1.13088

Authors

Aashish Sheshadri The University of Texas at Austin
Matthew Lease The University of Texas at Austin

DOI:

https://doi.org/10.1609/hcomp.v1i1.13088

Keywords:

Human Computation, Crowdsourcing, Consensus, Aggregation, Benchmarking

Abstract

While many statistical consensus methods now exist, relatively little comparative benchmarking and integration of techniques has made it increasingly difficult to determine the current state-of-the-art, to evaluate the relative benefit of new methods, to understand where specific problems merit greater attention, and to measure field progress over time. To make such comparative evaluation easier for everyone, we present SQUARE, an open source shared task framework including benchmark datasets, defined tasks, standard metrics, and reference implementations with empirical results for several popular methods. In addition to measuring performance on a variety of public, real crowd datasets, the benchmark also varies supervision and noise by manipulating training size and labeling error. We envision SQUARE as dynamic and continually evolving, with new datasets and reference implementations being added according to community needs and interest. We invite community contributions and participation.

SQUARE: A Benchmark for Research on Computing Crowd Consensus

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information