Effective Data Distillation for Tabular Datasets (Student Abstract)

Inwon Kang; Parikshit Ram; Yi Zhou; Horst Samulowitz; Oshani Seneviratne

doi:10.1609/aaai.v38i21.30460

Effective Data Distillation for Tabular Datasets (Student Abstract)

Authors

Inwon Kang Rensselaer Polytechnic Institute
Parikshit Ram IBM Research
Yi Zhou IBM Research
Horst Samulowitz IBM Research
Oshani Seneviratne Rensselaer Polytechnic Institute

DOI:

https://doi.org/10.1609/aaai.v38i21.30460

Keywords:

Information Extraction, Knowledge Representation, Machine Learning

Abstract

Data distillation is a technique of reducing a large dataset into a smaller dataset. The smaller dataset can then be used to train a model which can perform comparably to a model trained on the full dataset. Past works have examined this approach for image datasets, focusing on neural networks as target models. However, tabular datasets pose new challenges not seen in images. A sample in tabular dataset is a one dimensional vector unlike the two (or three) dimensional pixel grid of images, and Non-NN models such as XGBoost can often outperform neural network (NN) based models. Our contribution in this work is two-fold: 1) We show in our work that data distillation methods from images do not translate directly to tabular data; 2) We propose a new distillation method that consistently outperforms the baseline for multiple different models, including non-NN models such as XGBoost.

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

Downloads

Published

2024-03-24

How to Cite

Kang, I., Ram, P., Zhou, Y., Samulowitz, H., & Seneviratne, O. (2024). Effective Data Distillation for Tabular Datasets (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 38(21), 23533-23534. https://doi.org/10.1609/aaai.v38i21.30460

Download Citation

Issue

Vol. 38 No. 21: IAAI-24, EAAI-24, AAAI-24 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Student Abstract and Poster Program

Effective Data Distillation for Tabular Datasets (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription