Knowledge Distillation via Constrained Variational Inference

Ardavan Saeedi; Yuria Utsumi; Li Sun; Kayhan Batmanghelich; Li-wei Lehman

doi:10.1609/aaai.v36i7.20786

Authors

Ardavan Saeedi Hyperfine
Yuria Utsumi MIT
Li Sun University of Pittsburgh
Kayhan Batmanghelich University of Pittsburgh
Li-wei Lehman MIT

DOI:

https://doi.org/10.1609/aaai.v36i7.20786

Keywords:

Machine Learning (ML)

Abstract

Knowledge distillation has been used to capture the knowledge of a teacher model and distill it into a student model with some desirable characteristics such as being smaller, more efficient, or more generalizable. In this paper, we propose a framework for distilling the knowledge of a powerful discriminative model such as a neural network into commonly used graphical models known to be more interpretable (e.g., topic models, autoregressive Hidden Markov Models). Posterior of latent variables in these graphical models (e.g., topic proportions in topic models) is often used as feature representation for predictive tasks. However, these posterior-derived features are known to have poor predictive performance compared to the features learned via purely discriminative approaches. Our framework constrains variational inference for posterior variables in graphical models with a similarity preserving constraint. This constraint distills the knowledge of the discriminative model into the graphical model by ensuring that input pairs with (dis)similar representation in the teacher model also have (dis)similar representation in the student model. By adding this constraint to the variational inference scheme, we guide the graphical model to be a reasonable density model for the data while having predictive features which are as close as possible to those of a discriminative model. To make our framework applicable to a wide range of graphical models, we build upon the Automatic Differentiation Variational Inference (ADVI), a black-box inference framework for graphical models. We demonstrate the effectiveness of our framework on two real-world tasks of disease subtyping and disease trajectory modeling.

Knowledge Distillation via Constrained Variational Inference

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information