Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation

Authors

  • Qin Wang The Chinese University of Hong Kong (Shenzhen) The Future Network of Intelligence Institute (FNii) Shenzhen Research Institute of Big Data
  • Jiayang Chen The Chinese University of Hong Kong
  • Yuzhe Zhou The Chinese University of Hong Kong (Shenzhen) The Future Network of Intelligence Institute (FNii) Shenzhen Research Institute of Big Data
  • Yu Li The Chinese University of Hong Kong
  • Liangzhen Zheng Shanghai Zelixir Biotech
  • Sheng Wang Shanghai Zelixir Biotech
  • Zhen Li The Chinese University of Hong Kong (Shenzhen) The Future Network of Intelligence Institute (FNii) Shenzhen Research Institute of Big Data
  • Shuguang Cui The Chinese University of Hong Kong (Shenzhen) The Future Network of Intelligence Institute (FNii) Shenzhen Research Institute of Big Data

DOI:

https://doi.org/10.1609/aaai.v36i4.20386

Keywords:

Domain(s) Of Application (APP)

Abstract

Accurate protein contact map prediction (PCMP) is essential for precise protein structure estimation and further biological studies. Recent works achieve significant performance on this task with high quality multiple sequence alignment (MSA). However, the PCMP accuracy drops dramatically while only poor MSA (e.g., absolute MSA count less than 10) is available. Therefore, in this paper, we propose the Contact-Distil to improve the low homologous PCMP accuracy through knowledge distillation on a self-supervised model. Particularly, two pre-trained transformers are exploited to learn the high quality and low quality MSA representation in parallel for the teacher and student model correspondingly. Besides, the co-evolution information is further extracted from pure sequence through a pretrained ESM-1b model, which provides auxiliary knowledge to improve student performance. Extensive experiments show Contact-Distil outperforms previous state-of-the-arts by large margins on CAMEO-L dataset for low homologous PCMP, i.e., around 13.3% and 9.5% improvements against Alphafold2 and MSA Transformer respectively when MSA count less than 10.

Downloads

Published

2022-06-28

How to Cite

Wang, Q., Chen, J., Zhou, Y., Li, Y., Zheng, L., Wang, S., Li, Z., & Cui, S. (2022). Contact-Distil: Boosting Low Homologous Protein Contact Map Prediction by Self-Supervised Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 4620-4627. https://doi.org/10.1609/aaai.v36i4.20386

Issue

Section

AAAI Technical Track on Domain(s) Of Application