Predicting Variant Fitness of SARS-COV-2 from Full Viral Genome Sequences
DOI:
https://doi.org/10.1609/aaaiss.v7i1.36915Abstract
Accurate prediction of the transmission fitness of emerging SARS-CoV-2 variants is vital for timely public health responses. In this study, we present a deep learning framework that predicts variant fitness from raw genomic sequences using a convolutional neural network (CNN) trained to regress Differential Population Growth Rate (DPGR) values. Our approach achieves high predictive accuracy R-square value of 0.92 on genomic sequences sampled from the USA and Europe. To interpret the model’s predictions, we apply SHapley Additive exPlanations (SHAP) to identify nucleotide-level contributions to predicted fitness. Our analysis highlights key mutations in ORF9 (nucleocapsid), ORF2 (spike), ORF5 (membrane), and ORF8 that either enhance or reduce predicted DPGR. Notably, we identify amino acid–altering mutations such as D3L, E484K, N501Y, and V97I as strong positive contributors to fitness, while synonymous or non-coding mutations had more subtle or regulatory effects. These findings validate the potential of sequence-based modeling and interpretable AI to support early detection and prioritization of high-risk variants.Downloads
Published
2025-11-23
How to Cite
Annan, R., Nkonu, U., Hatami, P., Pantho, M. J., Qingge, L., & Qin, H. (2025). Predicting Variant Fitness of SARS-COV-2 from Full Viral
Genome Sequences. Proceedings of the AAAI Symposium Series, 7(1), 428–437. https://doi.org/10.1609/aaaiss.v7i1.36915
Issue
Section
Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health (SECURE-AI4H)