Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge

Authors

  • Eddy Maddalena University of Udine
  • Marco Basaldella University of Udine
  • Dario De Nart University of Udine
  • Dante Degl'Innocenti University of Udine
  • Stefano Mizzaro University of Udine
  • Gianluca Demartini University of Sheffield

DOI:

https://doi.org/10.1609/hcomp.v4i1.13284

Keywords:

Crowdsourcing, Relevance assessments, Time constraints

Abstract

Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to the availability of crowdsourcing platforms and quality control techniques that allow to obtain reliable results. Previous work has used crowdsourcing to ask multiple crowd workers to judge the relevance of a document with respect to a query and studied how to best aggregate multiple judgments of the same topic-document pair. This paper addresses an aspect that has been rather overlooked so far: we study how the time available to express a relevance judgment affects its quality. We also discuss the quality loss of making crowdsourced relevance judgments more efficient in terms of time taken to judge the relevance of a document. We use standard test collections to run a battery of experiments on the crowdsourcing platform CrowdFlower, studying how much time crowd workers need to judge the relevance of a document and at what is the effect of reducing the available time to judge on the overall quality of the judgments. Our extensive experiments compare judgments obtained under different types of time constraints with judgments obtained when no time constraints were put on the task. We measure judgment quality by different metrics of agreement with editorial judgments. Experimental results show that it is possible to reduce the cost of crowdsourced evaluation collection creation by reducing the time available to perform the judgments with no loss in quality. Most importantly, we observed that the introduction of limits on the time available to perform the judgments improves the overall judgment quality. Top judgment quality is obtained with 25-30 seconds to judge a topic-document pair.

Downloads

Published

2016-09-21

How to Cite

Maddalena, E., Basaldella, M., De Nart, D., Degl’Innocenti, D., Mizzaro, S., & Demartini, G. (2016). Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 4(1), 129-138. https://doi.org/10.1609/hcomp.v4i1.13284