Sequence-Free for Compound Protein Interaction Prediction

Authors

  • Hongzhi Zhang School of Computer Science, Wuhan University, Wuhan, China
  • Jiameng Chen School of Computer Science, Wuhan University, Wuhan, China
  • Kun Li School of Computer Science, Wuhan University, Wuhan, China
  • Yida Xiong School of Computer Science, Wuhan University, Wuhan, China
  • Xiantao Cai School of Computer Science, Wuhan University, Wuhan, China
  • Wenbin Hu Shenzhen Research Institute, Wuhan University, Shenzhen, China School of Computer Science, Wuhan University, Wuhan, China
  • Jia Wu Department of Computing, Macquarie University, Sydney, Australia

DOI:

https://doi.org/10.1609/aaai.v40i19.38666

Abstract

The prediction of compound–protein interactions (CPIs) is crucial for drug discovery. Most existing CPI prediction models rely on protein sequence information as input. However, in early-stage drug development, particularly in phenotype-driven studies or compound-response analyses, proteins are often annotated only with functional labels, and their sequences remain undetermined. Consequently, current methods are inapplicable in such scenarios. Furthermore, our experiments find that even when large-scale perturbations were applied to protein sequences, the predictive performance of the existing models did not show a significant decline. It indicates that the high investment in sequencing may not bring corresponding returns. To address the above issues, we propose an inexpensive, protein-sequencing-free framework BioText-CPI, based on the Biomedical Textual description of protein for CPI prediction. Firstly, during the pre-training stage of the model, we use contrastive learning to align protein texts and sequence modalities. Subsequently, we add biological text descriptions of proteins to the existing public CPI dataset to construct a new CPI dataset. Finally, in the CPI prediction stage, the sequence and biomedical text descriptions of proteins can be used as the input for CPI prediction either separately or simultaneously to meet the application requirements of different scenarios. The experiments demonstrate that BioText-CPI achieves comparable effects to the traditional methods when only the biomedical description of protein is input. Moreover, when the two modalities of protein information are input simultaneously, BioText-CPI achieves state-of-the-art performance across multiple scenarios.

Downloads

Published

2026-03-14

How to Cite

Zhang, H., Chen, J., Li, K., Xiong, Y., Cai, X., Hu, W., & Wu, J. (2026). Sequence-Free for Compound Protein Interaction Prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 40(19), 16289–16297. https://doi.org/10.1609/aaai.v40i19.38666

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management III