Contrast-Enhanced Semi-supervised Text Classification with Few Labels

Austin Cheng-Yun Tsai; Sheng-Ya Lin; Li-Chen Fu

doi:10.1609/aaai.v36i10.21391

Authors

Austin Cheng-Yun Tsai National Taiwan University
Sheng-Ya Lin National Taiwan University
Li-Chen Fu National Taiwan University

DOI:

https://doi.org/10.1609/aaai.v36i10.21391

Keywords:

Speech & Natural Language Processing (SNLP), Machine Learning (ML), Domain(s) Of Application (APP)

Abstract

Traditional text classification requires thousands of annotated data or an additional Neural Machine Translation (NMT) system, which are expensive to obtain in real applications. This paper presents a Contrast-Enhanced Semi-supervised Text Classification (CEST) framework under label-limited settings without incorporating any NMT systems. We propose a certainty-driven sample selection method and a contrast-enhanced similarity graph to utilize data more efficiently in self-training, alleviating the annotation-starving problem. The graph imposes a smoothness constraint on the unlabeled data to improve the coherence and the accuracy of pseudo-labels. Moreover, CEST formulates the training as a “learning from noisy labels” problem and performs the optimization accordingly. A salient feature of this formulation is the explicit suppression of the severe error propagation problem in conventional semi-supervised learning. With solely 30 labeled data per class for both training and validation dataset, CEST outperforms the previous state-of-the-art algorithms by 2.11% accuracy and only falls within the 3.04% accuracy range of fully-supervised pre-training language model fine-tuning on thousands of labeled data.

Contrast-Enhanced Semi-supervised Text Classification with Few Labels

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription