Batch Prioritization of Data Labeling Tasks for Training Classifiers

Masanari Kimura; Kei Wakabayashi; Atsuyuki Morishima

doi:10.1609/hcomp.v8i1.7476

Authors

Masanari Kimura University of Tsukuba
Kei Wakabayashi University of Tsukuba
Atsuyuki Morishima University of Tsukuba

DOI:

https://doi.org/10.1609/hcomp.v8i1.7476

Abstract

In a data labeling process for building machine learning, the choice of labeling data instances is known to have a significant impact on the performance of classifiers. So far, the study of active learning has addressed the issue of how to choose the subset by prioritizing the data instances based on the state of the current classifier. However, the active learning approach has two drawbacks that (i) require a training loop to update the priorities of labeling tasks and (ii) require us to choose a specific active learner while we do not know the optimal classification model. In this paper, we propose a new framework of priority-aware labeling system that allows a parallel task assignment to crowd workers without assuming a particular classifier, which is based on novel methods called “batch prioritization” and “label expansion”. We conducted experiments with multiple datasets to examine the effectiveness of the approach and found that the proposed method improves the performance of the final classifiers more quickly than the active learning approach despite that the labeling tasks can be processed in a fully parallel manner.

Batch Prioritization of Data Labeling Tasks for Training Classifiers

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information