Contrast-Enhanced Semi-supervised Text Classification with Few Labels
DOI:
https://doi.org/10.1609/aaai.v36i10.21391Keywords:
Speech & Natural Language Processing (SNLP), Machine Learning (ML), Domain(s) Of Application (APP)Abstract
Traditional text classification requires thousands of annotated data or an additional Neural Machine Translation (NMT) system, which are expensive to obtain in real applications. This paper presents a Contrast-Enhanced Semi-supervised Text Classification (CEST) framework under label-limited settings without incorporating any NMT systems. We propose a certainty-driven sample selection method and a contrast-enhanced similarity graph to utilize data more efficiently in self-training, alleviating the annotation-starving problem. The graph imposes a smoothness constraint on the unlabeled data to improve the coherence and the accuracy of pseudo-labels. Moreover, CEST formulates the training as a “learning from noisy labels” problem and performs the optimization accordingly. A salient feature of this formulation is the explicit suppression of the severe error propagation problem in conventional semi-supervised learning. With solely 30 labeled data per class for both training and validation dataset, CEST outperforms the previous state-of-the-art algorithms by 2.11% accuracy and only falls within the 3.04% accuracy range of fully-supervised pre-training language model fine-tuning on thousands of labeled data.Downloads
Published
2022-06-28
How to Cite
Tsai, A. C.-Y., Lin, S.-Y., & Fu, L.-C. (2022). Contrast-Enhanced Semi-supervised Text Classification with Few Labels. Proceedings of the AAAI Conference on Artificial Intelligence, 36(10), 11394-11402. https://doi.org/10.1609/aaai.v36i10.21391
Issue
Section
AAAI Technical Track on Speech and Natural Language Processing