Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data

Authors

  • Moonjung Eo LG AI Research
  • Kyungeun Lee LG AI Research
  • Hye-Seung Cho LG AI Research
  • Dongmin Kim LG AI Research
  • Ye Seul Sim LG AI Research
  • Woohyung Lim LG AI Research

DOI:

https://doi.org/10.1609/aaai.v39i11.33265

Abstract

Tabular data, widely used across industries, remains underexplored in deep learning. Self-supervised learning (SSL) shows promise for pre-training deep neural networks (DNNs) on tabular data, but its potential is hindered by challenges in designing suitable augmentations. Unlike image and text data, where SSL leverages inherent spatial or semantic structures, tabular data lacks such explicit structure. This makes traditional input-level augmentations, like modifying or removing features, less effective due to difficulties in balancing critical information preservation with variability. To address these challenges, we propose RaTab, a novel method that shifts augmentation from input-level to representation-level using matrix factorization, specifically truncated SVD. This approach preserves essential data structures while generating diverse representations by applying dropout at various stages of the representation, thereby significantly enhancing SSL performance for tabular data.

Published

2025-04-11

How to Cite

Eo, M., Lee, K., Cho, H.-S., Kim, D., Sim, Y. S., & Lim, W. (2025). Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data. Proceedings of the AAAI Conference on Artificial Intelligence, 39(11), 11625-11633. https://doi.org/10.1609/aaai.v39i11.33265

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management I