Predicting Variant Fitness of SARS-COV-2 from Full Viral Genome Sequences

Authors

  • Richard Annan North Carolina A&T State University
  • Ursula Nkonu Old Dominion University
  • Parisa Hatami University of Tennessee at Chattanooga
  • Md Jubair Pantho University of Tennessee at Chattanooga
  • Letu Qingge North Carolina A&T State University
  • Hong Qin Old Dominion University

DOI:

https://doi.org/10.1609/aaaiss.v7i1.36915

Abstract

Accurate prediction of the transmission fitness of emerging SARS-CoV-2 variants is vital for timely public health responses. In this study, we present a deep learning framework that predicts variant fitness from raw genomic sequences using a convolutional neural network (CNN) trained to regress Differential Population Growth Rate (DPGR) values. Our approach achieves high predictive accuracy R-square value of 0.92 on genomic sequences sampled from the USA and Europe. To interpret the model’s predictions, we apply SHapley Additive exPlanations (SHAP) to identify nucleotide-level contributions to predicted fitness. Our analysis highlights key mutations in ORF9 (nucleocapsid), ORF2 (spike), ORF5 (membrane), and ORF8 that either enhance or reduce predicted DPGR. Notably, we identify amino acid–altering mutations such as D3L, E484K, N501Y, and V97I as strong positive contributors to fitness, while synonymous or non-coding mutations had more subtle or regulatory effects. These findings validate the potential of sequence-based modeling and interpretable AI to support early detection and prioritization of high-risk variants.

Downloads

Published

2025-11-23

How to Cite

Annan, R., Nkonu, U., Hatami, P., Pantho, M. J., Qingge, L., & Qin, H. (2025). Predicting Variant Fitness of SARS-COV-2 from Full Viral Genome Sequences. Proceedings of the AAAI Symposium Series, 7(1), 428–437. https://doi.org/10.1609/aaaiss.v7i1.36915

Issue

Section

Safe, Ethical, Certified, Uncertainty-aware, Robust, and Explainable AI for Health (SECURE-AI4H)