Pareto-Based Heterogeneous Knowledge Distillation for MLPs on Graphs
DOI:
https://doi.org/10.1609/aaai.v40i34.40108Abstract
Heterogeneous Graph Neural Networks (HGNNs) have demonstrated remarkable capabilities in capturing effective information in heterogeneous graphs, achieving outstanding performance in various learning tasks. However, the heavy dependency of HGNNs on neighbors information may result in high latency, which restricts their practicality in real-world applications. Recent studies have attempted to overcome such latency in Graph Neural Networks (GNNs) by distilling knowledge into student models that do not rely on graph structure. But these approaches primarily focus on replicating teachers' predictive outcomes while neglecting the structural knowledge they encoded. This limitation makes such approach less effective when graphs become complex, particularly on heterogeneous graphs. Motivated by this challenge, we propose HGKD, a novel hierarchical knowledge distillation framework that transfers both structural knowledge and predictive outcomes from HGNN teachers to a multi-layer perceptron student. Additionally, we provide two variants of HGKD that help the student learn from multiple teacher models through Pareto learning and incorporate low-cost neighbor information. We evaluate HGKD and its variants on a range of heterogeneous graph datasets. The results demonstrate that our student model achieves performance comparable to or exceeding that of HGNN teachers, despite not relying on graph structures during inference.Downloads
Published
2026-03-14
How to Cite
Zhao, W., Tian, Y., Xu, Z., Wang, Y., & Zhang, C. (2026). Pareto-Based Heterogeneous Knowledge Distillation for MLPs on Graphs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(34), 28751-28759. https://doi.org/10.1609/aaai.v40i34.40108
Issue
Section
AAAI Technical Track on Machine Learning XI