DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery – a Focus on Affinity Prediction Problems with Noise Annotations

Authors

  • Yuanfeng Ji HKU, Tencent AI Lab
  • Lu Zhang Fudan University, Tencent AI Lab
  • Jiaxiang Wu Tencent AI Lab
  • Bingzhe Wu Tencent AI Lab
  • Lanqing Li Tencent
  • Long-Kai Huang Tencent AI Lab
  • Tingyang Xu Tencent AI Lab
  • Yu Rong Tencent AI Lab
  • Jie Ren Tencent AI Lab
  • Ding Xue Tencent AI Lab
  • Houtim Lai Tencent AI Lab
  • Wei Liu Tencent AI Lab
  • Junzhou Huang University of Texas at Arlington
  • Shuigeng Zhou Fudan University
  • Ping Luo The University of Hong Kong
  • Peilin Zhao Tencent AI Lab
  • Yatao Bian Tencent AI Lab

DOI:

https://doi.org/10.1609/aaai.v37i7.25970

Keywords:

ML: Transfer, Domain Adaptation, Multi-Task Learning, APP: Bioinformatics, APP: Healthcare, Medicine & Wellness, ML: Applications

Abstract

AI-aided drug discovery (AIDD) is gaining popularity due to its potential to make the search for new pharmaceuticals faster, less expensive, and more effective. Despite its extensive use in numerous fields (e.g., ADMET prediction, virtual screening), little research has been conducted on the out-of-distribution (OOD) learning problem with noise. We present DrugOOD, a systematic OOD dataset curator and benchmark for AIDD. Particularly, we focus on the drug-target binding affinity prediction problem, which involves both macromolecule (protein target) and small-molecule (drug compound). DrugOOD offers an automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise level annotations, and rigorous benchmarking of SOTA OOD algorithms, as opposed to only providing fixed datasets. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have revealed a significant performance gap between in-distribution and out-of-distribution experiments, emphasizing the need for the development of more effective schemes that permit OOD generalization under noise for AIDD.

Downloads

Published

2023-06-26

How to Cite

Ji, Y., Zhang, L., Wu, J., Wu, B., Li, L., Huang, L.-K., Xu, T., Rong, Y., Ren, J., Xue, D., Lai, H., Liu, W., Huang, J., Zhou, S., Luo, P., Zhao, P., & Bian, Y. (2023). DrugOOD: Out-of-Distribution Dataset Curator and Benchmark for AI-Aided Drug Discovery – a Focus on Affinity Prediction Problems with Noise Annotations. Proceedings of the AAAI Conference on Artificial Intelligence, 37(7), 8023-8031. https://doi.org/10.1609/aaai.v37i7.25970

Issue

Section

AAAI Technical Track on Machine Learning II