From Bias to Breakdown: Benchmarking Failure Mode Analysis of Single-cell RNA Sequencing Foundation Models in Acute Myeloid Leukemia

Authors

  • Amirreza Naziri York University Vector Institute Connected Minds
  • Arash Asgari York University Vector Institute
  • Aijun An York University Connected Minds
  • Eleftherios Sachlos York University Connected Minds
  • Laleh Seyyed-Kalantari York University Vector Institute Connected Minds

DOI:

https://doi.org/10.1609/aaaiss.v7i1.36931

Abstract

Foundation models (FMs) trained on large-scale single-cell RNA-seq (scRNA‐seq) data have shown strong performance across various biological tasks. These performances are often reported across a large set of test benchmarks across all samples. However, the pretraining data of these models are often highly imbalanced across disease types, patients' conditions, and demographics. For instance, disease samples are rarer and more challenging to collect, and the pretraining sets contain many more healthy cells. Such imbalances can hurt performance on underrepresented disease cases and the equality of the model outcome. To evaluate this hypothesis, we benchmark off-the-shelf scRNA-seq foundation models for cell-type classification in acute myeloid leukemia (AML), a rare but clinically important disease that represents low-prevalence settings. Here, besides overall performance, we conduct subgroup analysis of the outcome across cell types and disease conditions (clinical timepoints). Our results suggest that despite high overall F1 scores in cell-type classification, performance drops in disease conditions and varies across cell types. These findings highlight a limitation of current scRNA-seq foundation models and motivate more balanced pretraining and failure mode analysis rather than an overall performance report.

Downloads

Published

2025-11-23

How to Cite

Naziri, A., Asgari, A., An, A., Sachlos, E., & Seyyed-Kalantari, L. (2025). From Bias to Breakdown: Benchmarking Failure Mode Analysis of Single-cell RNA Sequencing Foundation Models in Acute Myeloid Leukemia. Proceedings of the AAAI Symposium Series, 7(1), 553-557. https://doi.org/10.1609/aaaiss.v7i1.36931

Issue

Section

Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health (SECURE-AI4H)