A Simple and Comprehensive Benchmark for Single-Cell Transcriptomics

Authors

  • Jiaxin Qi Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
  • Yan Cui Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
  • Kailei Guo Tianjin Medical University Eye Hospital, Tianjin, China
  • Xiaomin Zhang Tianjin Medical University Eye Hospital, Tianjin, China
  • Jianqiang Huang Computer Network Information Center, Chinese Academy of Sciences, Beijing, China Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China University of Chinese Academy of Sciences, Beijing, China
  • Gaogang Xie Computer Network Information Center, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i1.32049

Abstract

Single-cell transcriptomics describes complex molecular features at the individual cell level, serving various roles in biological research, such as enhancing gene expression and predicting drug responses. Due to transcriptomic data structurally resembling sequential data, many researchers have trained numerous transformers on extensive transcriptomic datasets. However, they have consistently neglected to explore the intrinsic properties of the data and the appropriateness of their chosen model architecture. In this paper, we carefully investigate the nature of transcriptomics, identifying three overlooked problems: 1) long-tailed data problem, 2) model selection problem, and 3) evaluation problem. Consequently, by applying the weighted sampling strategy, we address the long-tailed data problem and achieve consistent improvement across all settings. By adapting different model structures to transcriptomic data, we discover that transformers are not the only option. By developing three downstream tasks and fair evaluation metrics, we establish a simple and comprehensive benchmark to validate the effectiveness of models for transcriptomics. Through extensive experiments, we clarify the misunderstandings in the traditional methods and provide competitive baselines, thereby paving the way for future research in this field.

Downloads

Published

2025-04-11

How to Cite

Qi, J., Cui, Y., Guo, K., Zhang, X., Huang, J., & Xie, G. (2025). A Simple and Comprehensive Benchmark for Single-Cell Transcriptomics. Proceedings of the AAAI Conference on Artificial Intelligence, 39(1), 676-684. https://doi.org/10.1609/aaai.v39i1.32049

Issue

Section

AAAI Technical Track on Application Domains