PSSM-Distil: Protein Secondary Structure Prediction (PSSP) on Low-Quality PSSM by Knowledge Distillation with Contrastive Learning

Authors

  • Qin Wang Shenzhen Research Institute of Big Data, The Chinese University of Hongkong(Shenzhen)
  • Boyuan Wang Shenzhen Research Institute of Big Data, The Chinese University of Hongkong(Shenzhen) Tencent AI Lab
  • Zhenlei Xu Tencent AI Lab
  • Jiaxiang Wu Tencent AI Lab
  • Peilin Zhao Tencent AI Lab
  • Zhen Li Shenzhen Research Institute of Big Data, The Chinese University of Hongkong(Shenzhen)
  • Sheng Wang Tencent AI Lab
  • Junzhou Huang Tencent AI Lab
  • Shuguang Cui Shenzhen Research Institute of Big Data, The Chinese University of Hongkong(Shenzhen)

DOI:

https://doi.org/10.1609/aaai.v35i1.16141

Keywords:

Bioinformatics

Abstract

Protein secondary structure prediction (PSSP) is an essential task in computational biology. To achieve the accurate PSSP, the general and vital feature engineering is to use multiple sequence alignment (MSA) for Position-Specific Scoring Matrix (PSSM) extraction. However, when only low-quality PSSM can be obtained due to poor sequence homology, previous PSSP accuracy (merely around 65%) is far from practical usage for subsequent tasks. In this paper, we propose a novel PSSM-Distil framework for PSSP on low-quality PSSM, which not only enhances the PSSM feature at a lower level but also aligns the feature distribution at a higher level. In practice, the PSSM-Distil first exploits the proteins with high-quality PSSM to achieve a teacher network for PSSP in a full-supervised way. Under the guidance of the teacher network, the low-quality PSSM and corresponding student network with low discriminating capacity are effectively resolved by feature enhancement through EnhanceNet and distribution alignment through knowledge distillation with contrastive learning. Further, our PSSM-Distil supports the input from a pre-trained protein sequence language BERT model to provide auxiliary information, which is designed to address the extremely low-quality PSSM cases, i.e., no homologous sequence. Extensive experiments demonstrate the proposed PSSM-Distil outperforms state-of-the-art models on PSSP by 6% on average and nearly 8% in extremely low-quality cases on public benchmarks, BC40 and CB513.

Downloads

Published

2021-05-18

How to Cite

Wang, Q., Wang, B., Xu, Z., Wu, J., Zhao, P., Li, Z., Wang, S., Huang, J., & Cui, S. (2021). PSSM-Distil: Protein Secondary Structure Prediction (PSSP) on Low-Quality PSSM by Knowledge Distillation with Contrastive Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1), 617-625. https://doi.org/10.1609/aaai.v35i1.16141

Issue

Section

AAAI Technical Track on Application Domains