Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

Xinglu Wang; Yingming Li

doi:10.1609/aaai.v35i11.17225

Authors

Xinglu Wang Zhejiang University
Yingming Li Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v35i11.17225

Keywords:

Classification and Regression, Optimization, Representation Learning

Abstract

Multi-exit architectures, in which a sequence of intermediate classifiers are introduced at different depths of the feature layers, perform adaptive computation by early exiting ``easy" samples to speed up the inference. In this paper, a novel Harmonized Dense Knowledge Distillation (HDKD) training method for multi-exit architecture is designed to encourage each exit to flexibly learn from all its later exits. In particular, a general dense knowledge distillation training objective is proposed to incorporate all possible beneficial supervision information for multi-exit learning, where a harmonized weighting scheme is designed for the multi-objective optimization problem consisting of multi-exit classification loss and dense distillation loss. A bilevel optimization algorithm is introduced for alternatively updating the weights of multiple objectives and the multi-exit network parameters. Specifically, the loss weighting parameters are optimized with respect to its performance on validation set by gradient descent. Experiments on CIFAR100 and ImageNet show that the HDKD strategy harmoniously improves the performance of the state-of-the-art multi-exit neural networks. Moreover, this method does not require within architecture modifications and can be effectively combined with other previously-proposed training techniques and further boosts the performance.

Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription