Batch Prioritization of Data Labeling Tasks for Training Classifiers

Authors

  • Masanari Kimura University of Tsukuba
  • Kei Wakabayashi University of Tsukuba
  • Atsuyuki Morishima University of Tsukuba

DOI:

https://doi.org/10.1609/hcomp.v8i1.7476

Abstract

In a data labeling process for building machine learning, the choice of labeling data instances is known to have a significant impact on the performance of classifiers. So far, the study of active learning has addressed the issue of how to choose the subset by prioritizing the data instances based on the state of the current classifier. However, the active learning approach has two drawbacks that (i) require a training loop to update the priorities of labeling tasks and (ii) require us to choose a specific active learner while we do not know the optimal classification model. In this paper, we propose a new framework of priority-aware labeling system that allows a parallel task assignment to crowd workers without assuming a particular classifier, which is based on novel methods called “batch prioritization” and “label expansion”. We conducted experiments with multiple datasets to examine the effectiveness of the approach and found that the proposed method improves the performance of the final classifiers more quickly than the active learning approach despite that the labeling tasks can be processed in a fully parallel manner.

Downloads

Published

2020-10-01

How to Cite

Kimura, M., Wakabayashi, K., & Morishima, A. (2020). Batch Prioritization of Data Labeling Tasks for Training Classifiers. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 8(1), 163-167. https://doi.org/10.1609/hcomp.v8i1.7476