DS-ProGen: A Dual-Structure Deep Language Model for Functional Protein Design

Authors

  • Yanting Li The Chinese University of Hong Kong Hong Kong University of Science and Technology (Guangzhou)
  • Zikang Wang The Hong Kong Polytechnic University
  • Jiyue Jiang The Chinese University of Hong Kong
  • Ziqian Lin The Chinese University of Hong Kong
  • Dongchen He The Chinese University of Hong Kong
  • Yuheng Shan National University of Singapore
  • Yanruisheng Shao The Chinese University of Hong Kong
  • Jiayi Li The Chinese University of Hong Kong
  • Xiangyu Shi The Chinese University of Hong Kong
  • Jiuming Wang The Chinese University of Hong Kong
  • Yanyu Chen The Chinese University of Hong Kong
  • Yimin Fan The Chinese University of Hong Kong
  • Han Li The Chinese University of Hong Kong
  • Yu Li The Chinese University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i1.37037

Abstract

Inverse Protein Folding (IPF) is a critical subtask in the field of protein design, aiming to engineer amino acid sequences capable of folding correctly into a specified three-dimensional (3D) conformation. Although substantial progress has been achieved in recent years, existing methods generally rely on either backbone coordinates or molecular surface features alone, which restricts their ability to fully capture the complex chemical and geometric constraints necessary for precise sequence prediction. To address this limitation, we present DS-ProGen, a dual-structure deep language model for functional protein design, which integrates both backbone geometry and surface-level representations. By incorporating backbone coordinates as well as surface chemical and geometric descriptors into a next-amino-acid prediction paradigm, DS-ProGen is able to generate functionally relevant and structurally stable sequences while satisfying both global and local conformational constraints. On the PRIDE dataset, DS-ProGen attains the current state-of-the-art recovery rate of 61.47%, demonstrating the synergistic advantage of multi-modal structural encoding in protein design. Furthermore, DS-ProGen excels in predicting interactions with a variety of biological partners, including ligands, ions, and RNA, confirming its robust functional retention capabilities.

Downloads

Published

2026-03-14

How to Cite

Li, Y., Wang, Z., Jiang, J., Lin, Z., He, D., Shan, Y., … Li, Y. (2026). DS-ProGen: A Dual-Structure Deep Language Model for Functional Protein Design. Proceedings of the AAAI Conference on Artificial Intelligence, 40(1), 712–720. https://doi.org/10.1609/aaai.v40i1.37037

Issue

Section

AAAI Technical Track on Application Domains I