Scaling Up Semi-supervised Learning with Unconstrained Unlabelled Data

Authors

  • Shuvendu Roy Queen's University
  • Ali Etemad Queen's University

DOI:

https://doi.org/10.1609/aaai.v38i13.29404

Keywords:

ML: Semi-Supervised Learning, ML: Representation Learning, ML: Scalability of ML Systems

Abstract

We propose UnMixMatch, a semi-supervised learning framework which can learn effective representations from unconstrained unlabelled data in order to scale up performance. Most existing semi-supervised methods rely on the assumption that labelled and unlabelled samples are drawn from the same distribution, which limits the potential for improvement through the use of free-living unlabeled data. Consequently, the generalizability and scalability of semi-supervised learning are often hindered by this assumption. Our method aims to overcome these constraints and effectively utilize unconstrained unlabelled data in semi-supervised learning. UnMixMatch consists of three main components: a supervised learner with hard augmentations that provides strong regularization, a contrastive consistency regularizer to learn underlying representations from the unlabelled data, and a self-supervised loss to enhance the representations that are learnt from the unlabelled data. We perform extensive experiments on 4 commonly used datasets and demonstrate superior performance over existing semi-supervised methods with a performance boost of 4.79%. Extensive ablation and sensitivity studies show the effectiveness and impact of each of the proposed components of our method. The code for our work is publicly available.

Published

2024-03-24

How to Cite

Roy, S., & Etemad, A. (2024). Scaling Up Semi-supervised Learning with Unconstrained Unlabelled Data. Proceedings of the AAAI Conference on Artificial Intelligence, 38(13), 14847-14856. https://doi.org/10.1609/aaai.v38i13.29404

Issue

Section

AAAI Technical Track on Machine Learning IV