Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers

Authors

  • Hadi Abdine Laboratoire d’Informatique (LIX), École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
  • Michail Chatzianastasis Laboratoire d’Informatique (LIX), École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
  • Costas Bouyioukos Epigenetics and Cell Fate, CNRS UMR7216, Université Paris Cité, F-75013 Paris, France. Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
  • Michalis Vazirgiannis Laboratoire d’Informatique (LIX), École Polytechnique, Institut Polytechnique de Paris, Palaiseau, France

DOI:

https://doi.org/10.1609/aaai.v38i10.28948

Keywords:

ML: Multimodal Learning, ML: Deep Neural Architectures and Foundation Models, ML: Graph-based Machine Learning, NLP: Generation

Abstract

In recent years, significant progress has been made in the field of protein function prediction with the development of various machine-learning approaches. However, most existing methods formulate the task as a multi-classification problem, i.e. assigning predefined labels to proteins. In this work, we propose a novel approach, Prot2Text, which predicts a protein's function in a free text style, moving beyond the conventional binary or categorical classifications. By combining Graph Neural Networks(GNNs) and Large Language Models(LLMs), in an encoder-decoder framework, our model effectively integrates diverse data types including protein sequence, structure, and textual annotation and description. This multimodal approach allows for a holistic representation of proteins' functions, enabling the generation of detailed and accurate functional descriptions. To evaluate our model, we extracted a multimodal protein dataset from SwissProt, and demonstrate empirically the effectiveness of Prot2Text. These results highlight the transformative impact of multimodal models, specifically the fusion of GNNs and LLMs, empowering researchers with powerful tools for more accurate function prediction of existing as well as first-to-see proteins.

Published

2024-03-24

How to Cite

Abdine, H., Chatzianastasis, M., Bouyioukos, C., & Vazirgiannis, M. (2024). Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 38(10), 10757-10765. https://doi.org/10.1609/aaai.v38i10.28948

Issue

Section

AAAI Technical Track on Machine Learning I