Flow-Based Knowledge Transfer for Efficient Large Model Distillation

Xinye Yang; Junhao Wang; RuiLi; Haosen Sun; Xuesheng Zhang; Zebang Liu; Gaochao Xu; Yiwei Chen

doi:10.1609/aaai.v40i33.39987

Authors

Xinye Yang Newcastle University
Junhao Wang College of Computer Science and Technology, Jilin University
RuiLi Independent Researcher
Haosen Sun Northwestern University
Xuesheng Zhang Meituan
Zebang Liu Independent Researcher
Gaochao Xu College of Computer Science and Technology, Jilin University
Yiwei Chen Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i33.39987

Abstract

Traditional knowledge distillation relies on simple MSE or KL divergence losses that fail to capture the complex distributional relationships between teacher and student model representations. We propose FlowDistill, a novel distillation framework that employs normalizing flows to model and transfer the intricate knowledge distributions from teacher to student models. Our approach introduces three key innovations: (1) Invertible Knowledge Mapping using continuous normalizing flows (CNFs) to learn bijective transformations between teacher and student representation spaces, enabling precise knowledge transfer without information loss, (2) Flow-Guided Progressive Distillation that gradually increases the complexity of knowledge transfer by learning hierarchical flow transformations from simple to complex distributions, and (3) Conditional Flow Networks that adapt knowledge transfer based on input context and task requirements. Unlike previous diffusion-based distillation methods such as DiffKD that suffer from computational overhead due to iterative denoising processes and information loss during noise addition, our flow-based approach provides exact invertible transformations with significantly reduced computational cost. Extensive experiments on ImageNet classification, COCO object detection, and Cityscapes semantic segmentation demonstrate that FlowDistill achieves superior performance with 2.1% accuracy improvement over DiffKD on ResNet-34 to ResNet-18 distillation while reducing inference time by 3.5×. Our method establishes new state-of-the-art results across multiple distillation benchmarks and provides theoretical guarantees for lossless knowledge transfer through invertible flow transformations.

Flow-Based Knowledge Transfer for Efficient Large Model Distillation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information