Self-Decoupling and Ensemble Distillation for Efficient Segmentation
DOI:
https://doi.org/10.1609/aaai.v37i2.25266Keywords:
CV: Segmentation, ML: Deep Neural Architectures, ML: Deep Neural Network Algorithms, ML: Ensemble Methods, ML: Learning on the Edge & Model CompressionAbstract
Knowledge distillation (KD) is a promising teacher-student learning paradigm that transfers information from a cumbersome teacher to a student network. To avoid the training cost of a large teacher network, the recent studies propose to distill knowledge from the student itself, called Self-KD. However, due to the limitations of the performance and capacity of the student, the soft-labels or features distilled by the student barely provide reliable guidance. Moreover, most of the Self-KD algorithms are specific to classification tasks based on soft-labels, and not suitable for semantic segmentation. To alleviate these contradictions, we revisit the label and feature distillation problem in segmentation, and propose Self-Decoupling and Ensemble Distillation for Efficient Segmentation (SDES). Specifically, we design a decoupled prediction ensemble distillation (DPED) algorithm that generates reliable soft-labels with multiple expert decoders, and a decoupled feature ensemble distillation (DFED) mechanism to utilize more important channel-wise feature maps for encoder learning. The extensive experiments on three public segmentation datasets demonstrate the superiority of our approach and the efficacy of each component in the framework through the ablation study.Downloads
Published
2023-06-26
How to Cite
Liu, Y., Zhang, W., & Wang, J. (2023). Self-Decoupling and Ensemble Distillation for Efficient Segmentation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2), 1772-1780. https://doi.org/10.1609/aaai.v37i2.25266
Issue
Section
AAAI Technical Track on Computer Vision II