Multi-Scale Distillation from Multiple Graph Neural Networks

Chunhai Zhang; Jie Liu; Kai Dang; Wenzheng Zhang

doi:10.1609/aaai.v36i4.20354

Authors

Chunhai Zhang College Of Artificial Intelligence, Nankai University, Tianjin, China
Jie Liu College Of Artificial Intelligence, Nankai University, Tianjin, China Cloopen AI Research, Beijing, China
Kai Dang College Of Artificial Intelligence, Nankai University, Tianjin, China
Wenzheng Zhang College Of Artificial Intelligence, Nankai University, Tianjin, China

DOI:

https://doi.org/10.1609/aaai.v36i4.20354

Keywords:

Data Mining & Knowledge Management (DMKM), Knowledge Representation And Reasoning (KRR), Machine Learning (ML)

Abstract

Knowledge Distillation (KD), which is an effective model compression and acceleration technique, has been successfully applied to graph neural networks (GNNs) recently. Existing approaches utilize a single GNN model as the teacher to distill knowledge. However, we notice that GNN models with different number of layers demonstrate different classification abilities on nodes with different degrees. On the one hand, for nodes with high degrees, their local structures are dense and complex, hence more message passing is needed. Therefore, GNN models with more layers perform better. On the other hand, for nodes with low degrees, whose local structures are relatively sparse and simple, the repeated message passing can easily lead to over-smoothing. Thus, GNN models with less layers are more suitable. However, existing single-teacher GNN knowledge distillation approaches which are based on a single GNN model, are sub-optimal. To this end, we propose a novel approach to distill multi-scale knowledge, which learns from multiple GNN teacher models with different number of layers to capture the topological semantic at different scales. Instead of learning from the teacher models equally, the proposed method automatically assigns proper weights for each teacher model via an attention mechanism which enables the student to select teachers for different local structures. Extensive experiments are conducted to evaluate the proposed method on four public datasets. The experimental results demonstrate the superiority of our proposed method over state-of-the-art methods. Our code is publicly available at https://github.com/NKU-IIPLab/MSKD.

Multi-Scale Distillation from Multiple Graph Neural Networks

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription