Active Learning with Unbalanced Classes and Example-Generation Queries

Christopher Lin; Mausam Mausam; Daniel Weld

doi:10.1609/hcomp.v6i1.13334

Authors

Christopher Lin Microsoft
Mausam Mausam Indian Institute of Technology, Delhi
Daniel Weld University of Washington

DOI:

https://doi.org/10.1609/hcomp.v6i1.13334

Keywords:

active learning, machine learning, crowdsourcing, high-skew

Abstract

Machine learning in real-world high-skew domains is difficult, because traditional strategies for crowdsourcing labeled training examples are ineffective at locating the scarce minority-class examples. For example, both random sampling and traditional active learning (which reduces to random sampling when just starting) will most likely recover very few minority-class examples. To bootstrap the machine learning process, researchers have proposed tasking the crowd with finding or generating minority-class examples, but such strategies have their weaknesses as well. They are unnecessarily expensive in well-balanced domains, and they often yield samples from a biased distribution that is unrepresentative of the one being learned.This paper extends the traditional active learning framework by investigating the problem of intelligently switching between various crowdsourcing strategies for obtaining labeled training examples in order to optimally train a classifier. We start by analyzing several such strategies (e.g., annotate an example, generate a minority-class example, etc.), and then develop a novel, skew-robust algorithm, called MB-CB, for the control problem. Experiments show that our method outperforms state-of-the-art GL-Hybrid by up to 14.3 points in F1 AUC, across various domains and class-frequency settings.

Active Learning with Unbalanced Classes and Example-Generation Queries

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information