MORE: Molecule Pretraining with Multi-Level Pretext Task

Authors

  • Yeongyeong Son Department of Information Convergence Engineering, Pusan National University, Korea
  • Dasom Noh Department of Information Convergence Engineering, Pusan National University, Korea
  • Gyoungyoung Heo Department of Information Convergence Engineering, Pusan National University, Korea
  • Gyoung Jin Park Department of Information Convergence Engineering, Pusan National University, Korea
  • Sunyoung Kwon Department of Information Convergence Engineering, Pusan National University, Korea School of Biomedical Convergence Engineering, Pusan National University, Korea Center for Artificial Intelligence Research, Pusan National University, Korea

DOI:

https://doi.org/10.1609/aaai.v39i19.34262

Abstract

Foundation models, serving as pretrained fundamental bases for a variety of downstream tasks, try to learn versatile, rich, and generalizable representations that can be quickly adopted through fine-tuning or even in a zero-shot manner for specific applications. Foundation models for molecular representation are no exception. Various pretext tasks have been proposed for pretraining molecular representations, but these approaches have focused on only single or partial properties. Molecules are complicated and require different perspectives depending on purposes: insights from local- or global-level, 2D-topology or 3D-spatial arrangement, and low- or high-level semantics. We propose Multi-level mOlecule gRaph prE-train (MORE) to consider these multiple aspects of molecules simultaneously. Experimental results demonstrate that our proposed method effectively learns comprehensive representations by showing outstanding performance in both linear probing and full fine-tuning. Notably, in quantification experiments of forgetting the pretrained models, MORE consistently exhibits minimal and stable parameter changes with the smallest performance gap, whereas other methods show substantial and inconsistent fluctuations with larger gaps. The effectiveness of individual pretext tasks varies depending on the problems being solved, which again highlights the need for a multi-level perspective. Scalability experiments reveal steady improvements of MORE as the dataset size increases, suggesting potential gains with larger datasets as well.

Published

2025-04-11

How to Cite

Son, Y., Noh, D., Heo, G., Park, G. J., & Kwon, S. (2025). MORE: Molecule Pretraining with Multi-Level Pretext Task. Proceedings of the AAAI Conference on Artificial Intelligence, 39(19), 20531–20539. https://doi.org/10.1609/aaai.v39i19.34262

Issue

Section

AAAI Technical Track on Machine Learning V