Robust Detection of Synthetic Tabular Data Under Schema Variability

Authors

  • G. Charbel N. Kindji Orange Labs Lannion Université de Rennes, CNRS, Inria, IRISA UMR 6074
  • Elisa Fromont Université de Rennes, CNRS, Inria, IRISA UMR 6074
  • Lina M. Rojas-Barahona Orange Labs Lannion
  • Tanguy Urvoy Orange Labs Lannion

DOI:

https://doi.org/10.1609/aaai.v40i27.39422

Abstract

The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored task of detecting synthetic tabular data "in the wild", i.e. when the detector is deployed on tables with variable and previously unseen schemas. We introduce a novel datum-wise transformer architecture that significantly outperforms the only previously published baseline, improving both AUC and accuracy by 7 points. By incorporating a table-adaptation component, our model gains an additional 7 accuracy points, demonstrating enhanced robustness. This work provides the first strong evidence that detecting synthetic tabular data in real-world conditions is feasible, and demonstrates substantial improvements over previous approaches. The code will be made available in the extended version.

Published

2026-03-14

How to Cite

Kindji, G. C. N., Fromont, E., Rojas-Barahona, L. M., & Urvoy, T. (2026). Robust Detection of Synthetic Tabular Data Under Schema Variability. Proceedings of the AAAI Conference on Artificial Intelligence, 40(27), 22617–22625. https://doi.org/10.1609/aaai.v40i27.39422

Issue

Section

AAAI Technical Track on Machine Learning IV