BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning

Authors

  • Artem Zholus Insilico Medicine Canada Inc. Mila – Quebec AI Institute École Polytechnique de Montréal Chandar Research Lab
  • Maksim Kuznetsov Insilico Medicine Canada Inc.
  • Roman Schutski Insilico Medicine AI Limited
  • Rim Shayakhmetov Insilico Medicine Canada Inc.
  • Daniil Polykovskiy Insilico Medicine Canada Inc.
  • Sarath Chandar Mila – Quebec AI Institute École Polytechnique de Montréal Chandar Research Lab CIFAR AI Chair
  • Alex Zhavoronkov Insilico Medicine AI Limited

DOI:

https://doi.org/10.1609/aaai.v39i24.34804

Abstract

Generating novel active molecules for a given protein is an extremely challenging task for generative models that requires an understanding of the complex physical interactions between the molecule and its environment. This paper presents a novel generative model, BindGPT, which uses a conceptually simple but powerful approach to create 3D molecules within the protein's binding site. Our model produces molecular graphs and conformations jointly, eliminating the need for an extra graph reconstruction step. We pre-train BindGPT on a large-scale dataset and fine-tune it with reinforcement learning using scores from external simulation software. We demonstrate how a single pre-trained language model can serve at the same time as a 3D molecular generative model, a conformer generator conditioned on the molecular graph, and a pocket-conditioned 3D molecule generator. Notably, the model does not make any representational equivariance assumptions about the domain of generation. We show how such a simple conceptual approach combined with pre-training and scaling can perform on par or better than the current best-specialized diffusion models, language models, and graph neural networks while being two orders of magnitude cheaper to sample.

Downloads

Published

2025-04-11

How to Cite

Zholus, A., Kuznetsov, M., Schutski, R., Shayakhmetov, R., Polykovskiy, D., Chandar, S., & Zhavoronkov, A. (2025). BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 26083–26091. https://doi.org/10.1609/aaai.v39i24.34804

Issue

Section

AAAI Technical Track on Natural Language Processing III