Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge

Eddy Maddalena; Marco Basaldella; Dario De Nart; Dante Degl'Innocenti; Stefano Mizzaro; Gianluca Demartini

doi:10.1609/hcomp.v4i1.13284

Authors

Eddy Maddalena University of Udine
Marco Basaldella University of Udine
Dario De Nart University of Udine
Dante Degl'Innocenti University of Udine
Stefano Mizzaro University of Udine
Gianluca Demartini University of Sheffield

DOI:

https://doi.org/10.1609/hcomp.v4i1.13284

Keywords:

Crowdsourcing, Relevance assessments, Time constraints

Abstract

Crowdsourcing has become an alternative approach to collect relevance judgments at scale thanks to the availability of crowdsourcing platforms and quality control techniques that allow to obtain reliable results. Previous work has used crowdsourcing to ask multiple crowd workers to judge the relevance of a document with respect to a query and studied how to best aggregate multiple judgments of the same topic-document pair. This paper addresses an aspect that has been rather overlooked so far: we study how the time available to express a relevance judgment affects its quality. We also discuss the quality loss of making crowdsourced relevance judgments more efficient in terms of time taken to judge the relevance of a document. We use standard test collections to run a battery of experiments on the crowdsourcing platform CrowdFlower, studying how much time crowd workers need to judge the relevance of a document and at what is the effect of reducing the available time to judge on the overall quality of the judgments. Our extensive experiments compare judgments obtained under different types of time constraints with judgments obtained when no time constraints were put on the task. We measure judgment quality by different metrics of agreement with editorial judgments. Experimental results show that it is possible to reduce the cost of crowdsourced evaluation collection creation by reducing the time available to perform the judgments with no loss in quality. Most importantly, we observed that the introduction of limits on the time available to perform the judgments improves the overall judgment quality. Top judgment quality is obtained with 25-30 seconds to judge a topic-document pair.

Crowdsourcing Relevance Assessments: The Unexpected Benefits of Limiting the Time to Judge

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information