Regressor-guided Diffusion Model for De Novo Peptide Sequencing with Explicit Mass Control
DOI:
https://doi.org/10.1609/aaai.v40i1.36968Abstract
The discovery of novel proteins relies on sensitive protein identification, for which de novo peptide sequencing (DNPS) from mass spectra is a crucial approach. While deep learning has advanced DNPS, existing models inadequately enforce the fundamental mass consistency constraint—that a predicted peptide's mass must match the experimental measured precursor mass. Previous DNPS methods often treat this critical information as a simple input feature or use it in post-processing, leading to numerous implausible predictions that do not adhere to this fundamental physical property. To address this limitation, we introduce DiffuNovo, a novel regressor-guided diffusion model for de novo peptide sequencing that provides explicit peptide-level mass control. Our approach integrates the mass constraint at two critical stages: during training, a novel peptide-level mass loss guides model optimization, while at inference, regressor-based guidance from gradient-based updates in the latent space steers the generation to compel the predicted peptide adheres to the mass constraint. Comprehensive evaluations on established benchmarks demonstrate that DiffuNovo surpasses state-of-the-art methods in DNPS accuracy. Additionally, as the first DNPS model to employ a diffusion model as its core backbone, DiffuNovo leverages the powerful controllability of diffusion architecture and achieves a significant reduction in mass error, thereby producing much more physically plausible peptides. These innovations represent a substantial advancement toward robust and broadly applicable DNPS. The source code is available in the supplementary material.Downloads
Published
2026-03-14
How to Cite
Chen, S., Zhou, J., & Xia, J. (2026). Regressor-guided Diffusion Model for De Novo Peptide Sequencing with Explicit Mass Control. Proceedings of the AAAI Conference on Artificial Intelligence, 40(1), 92–100. https://doi.org/10.1609/aaai.v40i1.36968
Issue
Section
AAAI Technical Track on Application Domains I