CoLAL: Co-learning Active Learning for Text Classification

Authors

  • Linh Le The University of Queensland
  • Genghong Zhao Neusoft Research of Intelligent Healthcare Technology, Co. Ltd.
  • Xia Zhang Neusoft Corporation, China
  • Guido Zuccon The University of Queensland
  • Gianluca Demartini The University of Queensland

DOI:

https://doi.org/10.1609/aaai.v38i12.29235

Keywords:

ML: Active Learning, NLP: Text Classification

Abstract

In the machine learning field, the challenge of effectively learning with limited data has become increasingly crucial. Active Learning (AL) algorithms play a significant role in this by enhancing model performance. We introduce a novel AL algorithm, termed Co-learning (CoLAL), designed to select the most diverse and representative samples within a training dataset. This approach utilizes noisy labels and predictions made by the primary model on unlabeled data. By leveraging a probabilistic graphical model, we combine two multi-class classifiers into a binary one. This classifier determines if both the main and the peer models agree on a prediction. If they do, the unlabeled sample is assumed to be easy to classify and is thus not beneficial to increase the target model's performance. We prioritize data that represents the unlabeled set without overlapping decision boundaries. The discrepancies between these boundaries can be estimated by the probability that two models result in the same prediction. Through theoretical analysis and experimental validation, we reveal that the integration of noisy labels into the peer model effectively identifies target model's potential inaccuracies. We evaluated the CoLAL method across seven benchmark datasets: four text datasets (AGNews, DBPedia, PubMed, SST-2) and text-based state-of-the-art (SOTA) baselines, and three image datasets (CIFAR100, MNIST, OpenML-155) and computer vision SOTA baselines. The results show that our CoLAL method significantly outperforms existing SOTA in text-based AL, and is competitive with SOTA image-based AL techniques.

Published

2024-03-24

How to Cite

Le, L., Zhao, G., Zhang, X., Zuccon, G., & Demartini, G. (2024). CoLAL: Co-learning Active Learning for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13337-13345. https://doi.org/10.1609/aaai.v38i12.29235

Issue

Section

AAAI Technical Track on Machine Learning III