Global Mixup: Eliminating Ambiguity with Clustering

Authors

  • Xiangjin Xie Shenzhen International Graduate School, Tsinghua University
  • Li Yangning Shenzhen International Graduate School, Tsinghua University Pengcheng Laboratory
  • Wang Chen Google Inc.
  • Kai Ouyang Shenzhen International Graduate School, Tsinghua University
  • Zuotong Xie Shenzhen International Graduate School, Tsinghua University
  • Hai-Tao Zheng Shenzhen International Graduate School, Tsinghua University Pengcheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v37i11.26616

Keywords:

SNLP: Applications, SNLP: Text Classification

Abstract

Data augmentation with Mixup has been proven an effective method to regularize the current deep neural networks. Mixup generates virtual samples and corresponding labels simultaneously by linear interpolation. However, the one-stage generation paradigm and the use of linear interpolation have two defects: (1) The label of the generated sample is simply combined from the labels of the original sample pairs without reasonable judgment, resulting in ambiguous labels. (2) Linear combination significantly restricts the sampling space for generating samples. To address these issues, we propose a novel and effective augmentation method, Global Mixup, based on global clustering relationships. Specifically, we transform the previous one-stage augmentation process into two-stage by decoupling the process of generating virtual samples from the labeling. And for the labels of the generated samples, relabeling is performed based on clustering by calculating the global relationships of the generated samples. Furthermore, we are no longer restricted to linear relationships, which allows us to generate more reliable virtual samples in a larger sampling space. Extensive experiments for CNN, LSTM, and BERT on five tasks show that Global Mixup outperforms previous baselines. Further experiments also demonstrate the advantage of Global Mixup in low-resource scenarios.

Downloads

Published

2023-06-26

How to Cite

Xie, X., Yangning, L., Chen, W., Ouyang, K., Xie, Z., & Zheng, H.-T. (2023). Global Mixup: Eliminating Ambiguity with Clustering. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13798-13806. https://doi.org/10.1609/aaai.v37i11.26616

Issue

Section

AAAI Technical Track on Speech & Natural Language Processing