Quality-Aware Self-Training on Differentiable Synthesis of Rare Relational Data

Authors

  • Chongsheng Zhang Henan University
  • Yaxin Hou Henan University
  • Ke Chen South China University of Technology Peng Cheng Laboratory
  • Shuang Cao Henan University
  • Gaojuan Fan Henan University
  • Ji Liu Baidu Research

DOI:

https://doi.org/10.1609/aaai.v37i5.25811

Keywords:

KRR: Knowledge Engineering, ML: Deep Neural Network Algorithms, ML: Relational Learning, ML: Classification and Regression

Abstract

Data scarcity is a very common real-world problem that poses a major challenge to data-driven analytics. Although a lot of data-balancing approaches have been proposed to mitigate this problem, they may drop some useful information or fall into the overfitting problem. Generative Adversarial Network (GAN) based data synthesis methods can alleviate such a problem but lack of quality control over the generated samples. Moreover, the latent associations between the attribute set and the class labels in a relational data cannot be easily captured by a vanilla GAN. In light of this, we introduce an end-to-end self-training scheme (namely, Quality-Aware Self-Training) for rare relational data synthesis, which generates labeled synthetic data via pseudo labeling on GAN-based synthesis. We design a semantic pseudo labeling module to first control the quality of the generated features/samples, then calibrate their semantic labels via a classifier committee consisting of multiple pre-trained shallow classifiers. The high-confident generated samples with calibrated pseudo labels are then fed into a semantic classification network as augmented samples for self-training. We conduct extensive experiments on 20 benchmark datasets of different domains, including 14 industrial datasets. The results show that our method significantly outperforms state-of-the-art methods, including two recent GAN-based data synthesis schemes. Codes are available at https://github.com/yaxinhou/QAST.

Downloads

Published

2023-06-26

How to Cite

Zhang, C., Hou, Y., Chen, K., Cao, S., Fan, G., & Liu, J. (2023). Quality-Aware Self-Training on Differentiable Synthesis of Rare Relational Data. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5), 6602-6611. https://doi.org/10.1609/aaai.v37i5.25811

Issue

Section

AAAI Technical Track on Knowledge Representation and Reasoning