Synergy of GFlowNet and Protein Language Model Makes a Diverse Antibody Designer

Mingze Yin; Hanjing Zhou; Yiheng Zhu; Jialu Wu; Wei Wu; Mingyang Li; Kun Fu; Zheng Wang; Chang-Yu Hsieh; Tingjun Hou; Jian Wu

doi:10.1609/aaai.v39i21.34370

Authors

Mingze Yin College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University State Key Laboratory of Transvascular Implantation Devices of The Second Affiliated Hospital, Zhejiang University
Hanjing Zhou College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University
Yiheng Zhu College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University
Jialu Wu College of Pharmaceutical Sciences, Zhejiang University
Wei Wu School of Artificial Intelligence and Data Science, University of Science and Technology of China
Mingyang Li Alibaba Cloud Computing
Kun Fu Alibaba Cloud Computing
Zheng Wang Alibaba Cloud Computing
Chang-Yu Hsieh College of Pharmaceutical Sciences, Zhejiang University
Tingjun Hou College of Pharmaceutical Sciences, Zhejiang University
Jian Wu College of Computer Science & Technology and Liangzhu Laboratory, Zhejiang University State Key Laboratory of Transvascular Implantation Devices of The Second Affiliated Hospital, Zhejiang University Zhejiang Key Laboratory of Medical Imaging Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v39i21.34370

Abstract

Antibodies defend our health by binding to antigens with high specificity and potentiality, primarily relying on the Complementarity-Determining Region (CDR). Yet, current experimental methods of discovering new antibody CDRs are heavily time-consuming. Computational design could alleviate this burden; especially, protein language models have proven quite beneficial in many recent studies. However, most existing models solely focus on antibody potentiality and struggle to encapsulate the diverse range of plausible CDR candidates, limiting their effectiveness in real-world scenarios as binding is only one factor in the multitude of drug-forming criteria. In this paper, we introduce PG-AbD, a framework uniting Generative Flow Networks (GFlowNets) and pretrained Protein Language Models (PLMs) to successfully generate highly potent, diverse and novel antibody candidates. We innovatively construct a Products of Experts (PoE) composed by the global-distribution-modeling PLM and the local-distribution-modeling Potts Model to serve as the reward function of GFlowNet. The joint training paradigm is introduced, where PoE is trained by contrastive divergence with the negative samples generated by GFlowNet, and then guides GFlowNet to sample diverse antibody candidates. We evaluate PG-AbD on extensive antibody design benchmarks. It significantly outperforms existing methods in diversity (13.5% on RabDab, 31.1% on SabDab) while maintaining optimal potential and novelty. Generated antibodies are also found to form stable, regular 3D structures with their corresponding antigens, demonstrating the great potential of PG-AbD to accelerate real-world antibody discovery.

Synergy of GFlowNet and Protein Language Model Makes a Diverse Antibody Designer

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information