Direction Sensitivity–Based Knowledge Distillation: Optimization-Aware Low-Rank Knowledge Transfer

Yongkai Liao; Xinxing Chen; Zhongzheng Fu; Haoyuan Wang; Jian Huang

doi:10.1609/aaai.v40i28.39520

Authors

Yongkai Liao School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Xinxing Chen School of Artificial Intelligence and Automation, Huazhong University of Science and Technology Shenzhen Huazhong University of Science and Technology Research Institute
Zhongzheng Fu School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Haoyuan Wang School of Artificial Intelligence and Automation, Huazhong University of Science and Technology
Jian Huang School of Artificial Intelligence and Automation, Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i28.39520

Abstract

Knowledge distillation (KD) aims to enhance the performance of lightweight student networks through the guidance of teacher models. However, the existing methods have deficiencies in two key aspects: First, these methods rely heavily on static representation alignment, failing to account for optimization sensitivity in different directions within the distillation subspace; second, they lack a fine-grained mechanism to align critical directional features. To address these issues, we propose Direction Sensitivity–based Knowledge Distillation method (DSKD), which can quantitatively measure the sensitivity of each direction to the loss function at different training stages and dynamically select the optimization direction accordingly. Meanwhile, we designed a directional sensitivities weighted distillation loss. By aligning the parameter matrices of the teacher and student models in the key directions, we can more effectively transfer knowledge and improve the distillation effect. We combined DSKD with multiple advanced distillation strategies and conducted an empirical evaluation in the GLUE benchmark and CIFAR-100. The results showed that this method could significantly improve the performance of existing distillation techniques.

Direction Sensitivity–Based Knowledge Distillation: Optimization-Aware Low-Rank Knowledge Transfer

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information