ICE-T: Interactions-aware Cross-column Contrastive Embedding for Heterogeneous Tabular Datasets

Authors

  • Tomas Tokar Wondeur AI Department of Mechanical and Industrial Engineering, University of Toronto
  • Scott Sanner Department of Mechanical and Industrial Engineering, University of Toronto Vector Institute for AI

DOI:

https://doi.org/10.1609/aaai.v39i20.35385

Abstract

Finding high-quality representations of heterogeneous tabular datasets is crucial for their effective use in downstream machine learning tasks. Contrastive representation learning (CRL) methods have been previously shown to provide a straightforward way to learn such representations across various data domains. Current tabular CRL methods learn joint embeddings of data instances (tabular rows) by minimizing a contrastive loss between the original instance and its perturbations. Unlike existing tabular CRL methods, we propose leveraging frameworks established in multimodal representation learning, treating each tabular column as a distinct modality. A naive approach that applies a contrastive loss pairwise to tabular columns is not only prohibitively expensive as the number of columns increases, but as we demonstrate, it also fails to capture interactions between variables. Instead, we propose a novel method called ICE-T that learns each columnar embedding by contrasting it with aggregate embeddings of the complementary part of the table, thus capturing interactions and scaling linearly with the number of columns. Unlike existing tabular CRL methods, ICE-T allows for column-specific embeddings to be obtained independently of the rest of the table, enabling the inference of missing values and translation between columnar variables. We provide a comprehensive evaluation of ICE-T across diverse datasets, demonstrating that it generally surpasses the performance of the state-of-the-art alternatives.

Downloads

Published

2025-04-11

How to Cite

Tokar, T., & Sanner, S. (2025). ICE-T: Interactions-aware Cross-column Contrastive Embedding for Heterogeneous Tabular Datasets. Proceedings of the AAAI Conference on Artificial Intelligence, 39(20), 20904–20911. https://doi.org/10.1609/aaai.v39i20.35385

Issue

Section

AAAI Technical Track on Machine Learning VI