Synergy of GFlowNet and Protein Language Model Makes a Diverse Antibody Designer

Authors

  • Mingze Yin College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University State Key Laboratory of Transvascular Implantation Devices of The Second Affiliated Hospital, Zhejiang University
  • Hanjing Zhou College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University
  • Yiheng Zhu College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University
  • Jialu Wu College of Pharmaceutical Sciences, Zhejiang University
  • Wei Wu School of Artificial Intelligence and Data Science, University of Science and Technology of China
  • Mingyang Li Alibaba Cloud Computing
  • Kun Fu Alibaba Cloud Computing
  • Zheng Wang Alibaba Cloud Computing
  • Chang-Yu Hsieh College of Pharmaceutical Sciences, Zhejiang University
  • Tingjun Hou College of Pharmaceutical Sciences, Zhejiang University
  • Jian Wu College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University State Key Laboratory of Transvascular Implantation Devices of The Second Affiliated Hospital, Zhejiang University Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v39i21.34370

Abstract

Antibodies defend our health by binding to antigens with high specificity and potentiality, primarily relying on the Complementarity-Determining Region (CDR). Yet, current experimental methods of discovering new antibody CDRs are heavily time-consuming. Computational design could alleviate this burden; especially, protein language models have proven quite beneficial in many recent studies. However, most existing models solely focus on antibody potentiality and struggle to encapsulate the diverse range of plausible CDR candidates, limiting their effectiveness in real-world scenarios as binding is only one factor in the multitude of drug-forming criteria. In this paper, we introduce PG-AbD, a framework uniting Generative Flow Networks (GFlowNets) and pretrained Protein Language Models (PLMs) to successfully generate highly potent, diverse and novel antibody candidates. We innovatively construct a Products of Experts (PoE) composed by the global-distribution-modeling PLM and the local-distribution-modeling Potts Model to serve as the reward function of GFlowNet. The joint training paradigm is introduced, where PoE is trained by contrastive divergence with the negative samples generated by GFlowNet, and then guides GFlowNet to sample diverse antibody candidates. We evaluate PG-AbD on extensive antibody design benchmarks. It significantly outperforms existing methods in diversity (13.5% on RabDab, 31.1% on SabDab) while maintaining optimal potential and novelty. Generated antibodies are also found to form stable, regular 3D structures with their corresponding antigens, demonstrating the great potential of PG-AbD to accelerate real-world antibody discovery.

Published

2025-04-11

How to Cite

Yin, M., Zhou, H., Zhu, Y., Wu, J., Wu, W., Li, M., … Wu, J. (2025). Synergy of GFlowNet and Protein Language Model Makes a Diverse Antibody Designer. Proceedings of the AAAI Conference on Artificial Intelligence, 39(21), 22164–22172. https://doi.org/10.1609/aaai.v39i21.34370

Issue

Section

AAAI Technical Track on Machine Learning VII