Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Huiyan Xue; Xuming Ran; Yaxin Li; Qi Xu; Enhui Li; Yi Xu; Qiang Zhang

doi:10.1609/aaai.v40i32.39960

Authors

Huiyan Xue Dalian University of Technology
Xuming Ran National University of Singapore
Yaxin Li Dalian University of Technology
Qi Xu Dalian University of Technology
Enhui Li Dalian University of Technology
Yi Xu Dalian University of Technology
Qiang Zhang Dalian University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i32.39960

Abstract

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures like Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity poses two fundamental challenges: (1) the isolation of sparse subnetworks severely limits cross-task knowledge reuse; and (2) increased sparsity reduces interference but often degrades performance due to constrained feature sharing.We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer, but as a topology-aligned information conduit. By identifying neurons with high activation frequency, SSD selectively distills knowledge within previous Top-K subnetworks and output logits—without requiring replay or task labels—preserving both sparsity and functional specialization.Unlike conventional distillation, SSD operates under hard modular constraints and enables structural realignment without altering the sparse architecture.While our method is validated on SDMLP, its structure-aligned mechanism has the potential to generalize to other sparse networks as a plug-in module for promoting representation sharing.Comprehensive experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and manifold coverage, offering a structurally grounded solution to sparse continual learning.

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information