Flow-Based Knowledge Transfer for Efficient Large Model Distillation

Authors

  • Xinye Yang Newcastle University
  • Junhao Wang College of Computer Science and Technology, Jilin University
  • RuiLi Independent Researcher
  • Haosen Sun Northwestern University
  • Xuesheng Zhang Meituan
  • Zebang Liu Independent Researcher
  • Gaochao Xu College of Computer Science and Technology, Jilin University
  • Yiwei Chen Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i33.39987

Abstract

Traditional knowledge distillation relies on simple MSE or KL divergence losses that fail to capture the complex distributional relationships between teacher and student model representations. We propose FlowDistill, a novel distillation framework that employs normalizing flows to model and transfer the intricate knowledge distributions from teacher to student models. Our approach introduces three key innovations: (1) Invertible Knowledge Mapping using continuous normalizing flows (CNFs) to learn bijective transformations between teacher and student representation spaces, enabling precise knowledge transfer without information loss, (2) Flow-Guided Progressive Distillation that gradually increases the complexity of knowledge transfer by learning hierarchical flow transformations from simple to complex distributions, and (3) Conditional Flow Networks that adapt knowledge transfer based on input context and task requirements. Unlike previous diffusion-based distillation methods such as DiffKD that suffer from computational overhead due to iterative denoising processes and information loss during noise addition, our flow-based approach provides exact invertible transformations with significantly reduced computational cost. Extensive experiments on ImageNet classification, COCO object detection, and Cityscapes semantic segmentation demonstrate that FlowDistill achieves superior performance with 2.1% accuracy improvement over DiffKD on ResNet-34 to ResNet-18 distillation while reducing inference time by 3.5×. Our method establishes new state-of-the-art results across multiple distillation benchmarks and provides theoretical guarantees for lossless knowledge transfer through invertible flow transformations.

Downloads

Published

2026-03-14

How to Cite

Yang, X., Wang, J., , R., Sun, H., Zhang, X., Liu, Z., … Chen, Y. (2026). Flow-Based Knowledge Transfer for Efficient Large Model Distillation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(33), 27666–27674. https://doi.org/10.1609/aaai.v40i33.39987

Issue

Section

AAAI Technical Track on Machine Learning X