Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings


  • An Nguyen University of Texas at Austin
  • Matthew Halpern University of Texas at Austin
  • Byron Wallace Northeastern University
  • Matthew Lease University of Texas at Austin



Subjective tasks, Rating, Probabilistic Graphical Models, Heteroskedastic, Expectation Maximization, Variational Inference, User satisfaction


While many methods have been proposed to ensure data quality for objective tasks (in which a single correct response is presumed to exist for each item), estimating data quality with subjective tasks remains largely unexplored. Consider the popular task of collecting instance ratings from human judges: while agreement tends be high for instances having extremely good or bad properties, instances with more middling properties naturally elicit a wider variance in opinion. In addition, because such subjectivity permits a valid diversity of responses, it can be difficult to detect if a judge does not undertake the task in good faith. To address this, we propose a probabilistic, heteroskedastic model in which the means and variances of worker responses are modeled as functions of instance attributes. We derive efficient Expectation Maximization (EM) learning and variational inference algorithms for parameter estimation. We apply our model to a large dataset of 24,132 Mechanical Turk ratings of user experience in viewing videos on smartphones with varying hardware capabilities. Results show that our method is effective at both predicting user ratings and in detecting unreliable respondents.




How to Cite

Nguyen, A., Halpern, M., Wallace, B., & Lease, M. (2016). Probabilistic Modeling for Crowdsourcing Partially-Subjective Ratings. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 4(1), 149-158.