Exploit Your Latents: Coarse-Grained Protein Backmapping with Latent Diffusion Models

Authors

  • Rongchao Zhang Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, School of Computer Science, Peking University, Beijing, China
  • Yu Huang National Engineering Research Center for Software Engineering, Peking University, Beijing, China
  • Yiwei Lou Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, School of Computer Science, Peking University, Beijing, China
  • Yi Xin National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
  • Haixu Chen Institute of Geriatrics&National Clinical Research Center of Geriatrics Disease, Chinese PLA General Hospital, Beijing, China
  • Yongzhi Cao Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, School of Computer Science, Peking University, Beijing, China
  • Hanpin Wang Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, School of Computer Science, Peking University, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v39i1.32098

Abstract

Coarse-grained (CG) molecular dynamics of proteins is a preferred approach to studying large molecules on extended time scales by condensing the entire atomic model into a limited number of pseudo-atoms and preserving the thermodynamic properties of the system. However, the significantly increased efficiency impedes the analysis of substantial physicochemical information, since high-resolution atomic details are sacrificed to accelerate simulation. In this paper, we propose LatCPB, a generative approach based on diffusion that enables high-resolution backmapping of CG proteins. Specifically, our model encodes an all-atom into discrete latent embeddings, aligned with learnable multimodal discrete priors for circumventing posterior collapse and maintaining the discrete properties of the protein sequence. During the generation, we further design a latent diffusion process within the continuous latent space due to the potential stochastics in the data. Moreover, LatCPB performs a contrastive learning strategy in latent space to separate feature representations of various molecules and conformations of the same molecule, thus enhancing the comprehension of molecular representational diversity. Experimental results demonstrate that LatCPB is able to backmap CG proteins effectively and achieve outstanding performance.

Downloads

Published

2025-04-11

How to Cite

Zhang, R., Huang, Y., Lou, Y., Xin, Y., Chen, H., Cao, Y., & Wang, H. (2025). Exploit Your Latents: Coarse-Grained Protein Backmapping with Latent Diffusion Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(1), 1111–1119. https://doi.org/10.1609/aaai.v39i1.32098

Issue

Section

AAAI Technical Track on Application Domains