Direction Sensitivity–Based Knowledge Distillation: Optimization-Aware Low-Rank Knowledge Transfer

Authors

  • Yongkai Liao School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
  • Xinxing Chen School of Artificial Intelligence and Automation, Huazhong University of Science and Technology Shenzhen Huazhong University of Science and Technology Research Institute
  • Zhongzheng Fu School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
  • Haoyuan Wang School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
  • Jian Huang School of Artificial Intelligence and Automation, Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i28.39520

Abstract

Knowledge distillation (KD) aims to enhance the performance of lightweight student networks through the guidance of teacher models. However, the existing methods have deficiencies in two key aspects: First, these methods rely heavily on static representation alignment, failing to account for optimization sensitivity in different directions within the distillation subspace; second, they lack a fine-grained mechanism to align critical directional features. To address these issues, we propose Direction Sensitivity–based Knowledge Distillation method (DSKD), which can quantitatively measure the sensitivity of each direction to the loss function at different training stages and dynamically select the optimization direction accordingly. Meanwhile, we designed a directional sensitivities weighted distillation loss. By aligning the parameter matrices of the teacher and student models in the key directions, we can more effectively transfer knowledge and improve the distillation effect. We combined DSKD with multiple advanced distillation strategies and conducted an empirical evaluation in the GLUE benchmark and CIFAR-100. The results showed that this method could significantly improve the performance of existing distillation techniques.

Downloads

Published

2026-03-14

How to Cite

Liao, Y., Chen, X., Fu, Z., Wang, H., & Huang, J. (2026). Direction Sensitivity–Based Knowledge Distillation: Optimization-Aware Low-Rank Knowledge Transfer. Proceedings of the AAAI Conference on Artificial Intelligence, 40(28), 23487–23495. https://doi.org/10.1609/aaai.v40i28.39520

Issue

Section

AAAI Technical Track on Machine Learning V