Leveraging Side Information to Improve Label Quality Control in Crowd-Sourcing

Yuan Jin; Mark Carman; Dongwoo Kim; Lexing Xie

doi:10.1609/hcomp.v5i1.13315

Authors

Yuan Jin Monash University
Mark Carman Monash University
Dongwoo Kim Australian National University
Lexing Xie Australian National University

DOI:

https://doi.org/10.1609/hcomp.v5i1.13315

Keywords:

Crowd-sourcing, Side Information, Probabilistic Modeling

Abstract

We investigate the possibility of leveraging side information for improving quality control over crowd-sourced data. We extend the GLAD model, which governs the probability of correct labeling through a logistic function in which worker expertise counteracts item difficulty, by systematically encod- ing different types of side information, including worker in- formation drawn from demographics and personality traits, item information drawn from item genres and content, and contextual information drawn from worker responses and la- beling sessions. Modeling side information allows for better estimation of worker expertise and item difficulty in sparse data situations and accounts for worker biases, leading to bet- ter prediction of posterior true label probabilities. We demon- strate the efficacy of the proposed framework with overall improvements in both the true label prediction and the un- seen worker response prediction based on different combina- tions of the various types of side information across three new crowd-sourcing datasets. In addition, we show the framework exhibits potential of identifying salient side information fea- tures for predicting the correctness of responses without the need of knowing any true label information.

Leveraging Side Information to Improve Label Quality Control in Crowd-Sourcing

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information