MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts

Authors

  • Haoqiang Guo Harbin Institute of Technology
  • Sendong Zhao Harbin Institute of Technology
  • Haochun Wang Harbin Institute of Technology
  • Yanrui Du Harbin Institute of Technology
  • Bing Qin Harbin Institute of Technology

DOI:

https://doi.org/10.1609/aaai.v38i16.29772

Keywords:

NLP: Language Grounding & Multi-modal NLP, APP: Natural Sciences

Abstract

Deep learning is now widely used in drug discovery, providing significant acceleration and cost reduction. As the most fundamental building block, molecular representation is essential for predicting molecular properties to enable various downstream applications. Most existing methods attempt to incorporate more information to learn better representations. However, not all features are equally important for a specific task. Ignoring this would potentially compromise the training efficiency and predictive accuracy. To address this issue, we propose a novel approach, which treats language models as an agent and molecular pretraining models as a knowledge base. The agent accentuates task-relevant features in the molecular representation by understanding the natural language description of the task, just as a tailor customizes clothes for clients. Thus, we call this approach MolTailor. Evaluations demonstrate MolTailor's superior performance over baselines, validating the efficacy of enhancing relevance for molecular representation learning. This illustrates the potential of language model guided optimization to better exploit and unleash the capabilities of existing powerful molecular representation methods. Our code and appendix are available at https://github.com/SCIR-HI/MolTailor.

Published

2024-03-24

How to Cite

Guo, H., Zhao, S., Wang, H., Du, Y., & Qin , B. (2024). MolTailor: Tailoring Chemical Molecular Representation to Specific Tasks via Text Prompts. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18144-18152. https://doi.org/10.1609/aaai.v38i16.29772

Issue

Section

AAAI Technical Track on Natural Language Processing I