Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows

Authors

  • Daniel Mas Montserrat Galatea Bio Inc, Miami Lakes, FL, USA
  • Ray Verma New York University, Abu Dhabi, UAE
  • Míriam Barrabés Galatea Bio Inc, Miami Lakes, FL, USA
  • Francisco M. de la Vega Galatea Bio Inc, Miami Lakes, FL, USA
  • Carlos D. Bustamante Galatea Bio Inc, Miami Lakes, FL, USA
  • Alexander G. Ioannidis Galatea Bio Inc, Miami Lakes, FL, USA

DOI:

https://doi.org/10.1609/aaai.v40i47.41443

Abstract

Large-scale genomic workflows used in precision medicine can process datasets spanning tens to hundreds of gigabytes per sample, leading to high memory spikes, intensive disk I/O, and task failures due to out-of-memory errors. Simple static resource allocation methods struggle to handle the variability in per-chromosome RAM demands, resulting in poor resource utilization and long runtimes. In this work, we propose multiple mechanisms for adaptive, RAM-efficient parallelization of chromosome-level bioinformatics workflows. First, we develop a symbolic regression model that estimates per-chromosome memory consumption for a given task and introduces an interpolating bias to conservatively minimize over-allocation. Second, we present a dynamic scheduler that adaptively predicts RAM usage with a polynomial regression model, treating task packing as a Knapsack problem to optimally batch jobs based on predicted memory requirements. Additionally, we present a static scheduler that optimizes chromosome processing order to minimize peak memory while preserving throughput. Our proposed methods, evaluated on simulations and real-world genomic pipelines, provide new mechanisms to reduce memory overruns and balance load across threads. We thereby achieve faster end-to-end execution, showcasing the potential to optimize large-scale genomic workflows.

Published

2026-03-14

How to Cite

Mas Montserrat, D., Verma, R., Barrabés, M., de la Vega, F. M., Bustamante, C. D., & Ioannidis, A. G. (2026). Efficient Chromosome Parallelization for Precision Medicine Genomic Workflows. Proceedings of the AAAI Conference on Artificial Intelligence, 40(47), 40083–40091. https://doi.org/10.1609/aaai.v40i47.41443

Issue

Section

IAAI Technical Track on Deployed Highly Innovative Applications of AI