Pareto-Based Heterogeneous Knowledge Distillation for MLPs on Graphs

Wenrui Zhao; Yijun Tian; Zhichao Xu; Yawei Wang; Chuxu Zhang

doi:10.1609/aaai.v40i34.40108

Authors

Wenrui Zhao George Mason University
Yijun Tian University of Notre Dame
Zhichao Xu University of Utah
Yawei Wang The George Washington University
Chuxu Zhang University of Connecticut

DOI:

https://doi.org/10.1609/aaai.v40i34.40108

Abstract

Heterogeneous Graph Neural Networks (HGNNs) have demonstrated remarkable capabilities in capturing effective information in heterogeneous graphs, achieving outstanding performance in various learning tasks. However, the heavy dependency of HGNNs on neighbors information may result in high latency, which restricts their practicality in real-world applications. Recent studies have attempted to overcome such latency in Graph Neural Networks (GNNs) by distilling knowledge into student models that do not rely on graph structure. But these approaches primarily focus on replicating teachers' predictive outcomes while neglecting the structural knowledge they encoded. This limitation makes such approach less effective when graphs become complex, particularly on heterogeneous graphs. Motivated by this challenge, we propose HGKD, a novel hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to a multi-layer perceptron student. Additionally, we provide two variants of HGKD that help the student learn from multiple teacher models through Pareto learning and incorporate low-cost neighbor information. We evaluate HGKD and its variants on a range of heterogeneous graph datasets. The results demonstrate that our student model achieves performance comparable to or exceeding that of HGNN teachers, despite not relying on graph structures during inference.

Pareto-Based Heterogeneous Knowledge Distillation for MLPs on Graphs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information