A Simple and Comprehensive Benchmark for Single-Cell Transcriptomics

Jiaxin Qi; Yan Cui; Kailei Guo; Xiaomin Zhang; Jianqiang Huang; Gaogang Xie

doi:10.1609/aaai.v39i1.32049

Authors

Jiaxin Qi Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Yan Cui Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
Kailei Guo Tianjin Medical University Eye Hospital, Tianjin, China
Xiaomin Zhang Tianjin Medical University Eye Hospital, Tianjin, China
Jianqiang Huang Computer Network Information Center, Chinese Academy of Sciences, Beijing, China Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China University of Chinese Academy of Sciences, Beijing, China
Gaogang Xie Computer Network Information Center, Chinese Academy of Sciences, Beijing, China University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i1.32049

Abstract

Single-cell transcriptomics describes complex molecular features at the individual cell level, serving various roles in biological research, such as enhancing gene expression and predicting drug responses. Due to transcriptomic data structurally resembling sequential data, many researchers have trained numerous transformers on extensive transcriptomic datasets. However, they have consistently neglected to explore the intrinsic properties of the data and the appropriateness of their chosen model architecture. In this paper, we carefully investigate the nature of transcriptomics, identifying three overlooked problems: 1) long-tailed data problem, 2) model selection problem, and 3) evaluation problem. Consequently, by applying the weighted sampling strategy, we address the long-tailed data problem and achieve consistent improvement across all settings. By adapting different model structures to transcriptomic data, we discover that transformers are not the only option. By developing three downstream tasks and fair evaluation metrics, we establish a simple and comprehensive benchmark to validate the effectiveness of models for transcriptomics. Through extensive experiments, we clarify the misunderstandings in the traditional methods and provide competitive baselines, thereby paving the way for future research in this field.

A Simple and Comprehensive Benchmark for Single-Cell Transcriptomics

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information