Data Wrangling Task Automation Using Code-Generating Language Models

Ashlesha Akella; Krishnasuri Narayanam

doi:10.1609/aaai.v39i28.35344

Data Wrangling Task Automation Using Code-Generating Language Models

Authors

Ashlesha Akella IBM Research India
Krishnasuri Narayanam IBM Research India

DOI:

https://doi.org/10.1609/aaai.v39i28.35344

Abstract

Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning approaches are resource-intensive, requiring task and dataset-specific training. We present an automated system that utilizes large language models to generate executable code for tasks like missing value imputation, error detection, and error correction. Our system aims to identify inherent patterns in the data while leveraging external knowledge, effectively addressing both memory-dependent and memory-independent tasks.

AAAI-25 / IAAI-25 / EAAI-25 Proceedings Cover

Downloads

Published

2025-04-11

How to Cite

Akella, A., & Narayanam, K. (2025). Data Wrangling Task Automation Using Code-Generating Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(28), 29616–29618. https://doi.org/10.1609/aaai.v39i28.35344

Download Citation

Issue

Vol. 39 No. 28: IAAI-25, EAAI-25, AAAI-25 Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Demonstration Track

Data Wrangling Task Automation Using Code-Generating Language Models

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information