SciDataMAS: LLM-Driven MAS for Scientific Data Management (Student Abstract)

Authors

  • Alexander Sachuk Peter the Great St. Petersburg Polytechnic University
  • Vyacheslav Chukanov Peter the Great St. Petersburg Polytechnic University
  • Ekaterina Pchitskaya Peter the Great St. Petersburg Polytechnic University

DOI:

https://doi.org/10.1609/aaai.v40i48.42275

Abstract

The management and annotation of complex, multi-modal scientific data remains a major obstacle for AI-driven research due to poor reusability and scalability of current solutions. We propose SciDataMAS, a novel LLM-powered multi-agent system (MAS), which automate scientific data management through a structured data lake with provenance-based organization and an adaptive metadata taxonomy. The system uses specialized workflows for automated dataset creation, data insertion and retrieval. Experiments show the system's proficiency, with modern LLMs like GPT-5 successfully generating rich metadata schemas and filling them with high accuracy. This work provides a foundational step towards fully automated, reusable, and scalable scientific data organization which may lead to generation and accumulation by scientific community well annotated AI-ready datasets.

Published

2026-03-14

How to Cite

Sachuk, A., Chukanov, V., & Pchitskaya, E. (2026). SciDataMAS: LLM-Driven MAS for Scientific Data Management (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41375–41376. https://doi.org/10.1609/aaai.v40i48.42275