Distributional Priors Guided Diffusion for Generating 3D Molecules in Low Data Regimes

Authors

  • Haokai Hong The Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong SAR, China.
  • Wanyu Lin The Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong SAR, China. The Department of Computing, The Hong Kong Polytechnic University, Hong Kong SAR, China.
  • Ming Yang The Department of Applied Physics, The Hong Kong Polytechnic University, Hong Kong SAR, China.
  • Kay Chen Tan The Department of Data Science and Artificial Intelligence, The Hong Kong Polytechnic University, Hong Kong SAR, China.

DOI:

https://doi.org/10.1609/aaai.v40i26.39325

Abstract

Can we train a 3D molecule generator using data from dense regions to generate samples in sparse regions? This challenge can be framed as an out-of-distribution (OOD) generation problem. While prior research on OOD generation predominantly targets property shifts, structural shifts, such as differences in molecular scaffolds or functional groups, represent an equally critical source of distributional shifts. This work introduces the Geometric OOD Diffusion Model (GODD), a novel diffusion-based framework that enables training on data-abundant molecular distributions while generalizing to data-scarce distributions under distributional structural shifts. Central to our approach is a designated equivariant asymmetric autoencoder to capture distributional structural priors. The asymmetric design allows the model to generalize to unseen structural variations by capturing distributional priors representing distinct distributions. The encoded structural-grained priors guide generation toward sparse regions without requiring explicit training on such data. Evaluated across standard benchmarks encompassing OOD structural shifts (e.g., scaffolds, rings), GODD achieves an improvement of 12.6% in success rate, defined based on molecular validity, uniqueness, and novelty. Furthermore, the framework demonstrates promising performance and generalization on canonical fragment-based drug design tasks, highlighting its utility in learning-based molecular discovery.

Published

2026-03-14

How to Cite

Hong, H., Lin, W., Yang, M., & Tan, K. C. (2026). Distributional Priors Guided Diffusion for Generating 3D Molecules in Low Data Regimes. Proceedings of the AAAI Conference on Artificial Intelligence, 40(26), 21743-21751. https://doi.org/10.1609/aaai.v40i26.39325

Issue

Section

AAAI Technical Track on Machine Learning III