Asking the Right Questions to the Right Users: Active Learning with Imperfect Oracles
Active learning algorithms automatically identify the salient and exemplar samples from large amounts of unlabeled data and tremendously reduce human annotation effort in inducing a machine learning model. In a traditional active learning setup, the labeling oracles are assumed to be infallible, that is, they always provide correct answers (in terms of class labels) to the queried unlabeled instances. However, in real-world applications, oracles are often imperfect and provide incorrect label annotations. Oracles also have diverse expertise and while they may be noisy, certain oracles may provide accurate annotations to certain specific instances. In this paper, we propose a novel framework to address the challenging problem of active learning in the presence of multiple imperfect oracles. We pose the optimal sample and oracle selection as a constrained optimization problem and derive a linear programming relaxation to select a batch of (sample-oracle) pairs, which can potentially augment maximal information to the underlying classification model. Our extensive empirical studies on 9 challenging datasets (from a variety of application domains) corroborate the usefulness of our framework over competing baselines.