Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to Ensure Quality Relevance Annotations
While peer-agreement and gold checks are well-established methods for ensuring quality in crowdsourced data collection, we explore a relatively new direction for quality control: estimating work quality directly from workers’ behavioral traces collected during annotation. We propose three behavior-based models to predict label correctness and worker accuracy, then further apply model predictions to label aggregation and optimization of label collection. As part of this work, we collect and share a new Mechanical Turk dataset of behavioral signals judging the relevance of search results. Results show that behavioral data can be effectively used to predict work quality, which could be especially useful with single labeling or in a cold start scenario in which individuals’ prior work history is unavailable. We further show improvement in label aggregation and reducing labeling cost while ensuring data quality.