Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings

An Nguyen; Matthew Halpern; Byron Wallace; Matthew Lease

doi:10.1609/hcomp.v4i1.13274

Authors

An Nguyen University of Texas at Austin
Matthew Halpern University of Texas at Austin
Byron Wallace Northeastern University
Matthew Lease University of Texas at Austin

DOI:

https://doi.org/10.1609/hcomp.v4i1.13274

Keywords:

Subjective tasks, Rating, Probabilistic Graphical Models, Heteroskedastic, Expectation Maximization, Variational Inference, User satisfaction

Abstract

While many methods have been proposed to ensure data quality for objective tasks (in which a single correct response is presumed to exist for each item), estimating data quality with subjective tasks remains largely unexplored. Consider the popular task of collecting instance ratings from human judges: while agreement tends be high for instances having extremely good or bad properties, instances with more middling properties naturally elicit a wider variance in opinion. In addition, because such subjectivity permits a valid diversity of responses, it can be difficult to detect if a judge does not undertake the task in good faith. To address this, we propose a probabilistic, heteroskedastic model in which the means and variances of worker responses are modeled as functions of instance attributes. We derive efficient Expectation Maximization (EM) learning and variational inference algorithms for parameter estimation. We apply our model to a large dataset of 24,132 Mechanical Turk ratings of user experience in viewing videos on smartphones with varying hardware capabilities. Results show that our method is effective at both predicting user ratings and in detecting unreliable respondents.

Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information