Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Authors

  • Huiyan Xue Dalian University of Technology
  • Xuming Ran National University of Singapore
  • Yaxin Li Dalian University of Technology
  • Qi Xu Dalian University of Technology
  • Enhui Li Dalian University of Technology
  • Yi Xu Dalian University of Technology
  • Qiang Zhang Dalian University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i32.39960

Abstract

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures like Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity poses two fundamental challenges: (1) the isolation of sparse subnetworks severely limits cross-task knowledge reuse; and (2) increased sparsity reduces interference but often degrades performance due to constrained feature sharing.We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer, but as a topology-aligned information conduit. By identifying neurons with high activation frequency, SSD selectively distills knowledge within previous Top-K subnetworks and output logits—without requiring replay or task labels—preserving both sparsity and functional specialization.Unlike conventional distillation, SSD operates under hard modular constraints and enables structural realignment without altering the sparse architecture.While our method is validated on SDMLP, its structure-aligned mechanism has the potential to generalize to other sparse networks as a plug-in module for promoting representation sharing.Comprehensive experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and manifold coverage, offering a structurally grounded solution to sparse continual learning.

Downloads

Published

2026-03-14

How to Cite

Xue, H., Ran, X., Li, Y., Xu, Q., Li, E., Xu, Y., & Zhang, Q. (2026). Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory. Proceedings of the AAAI Conference on Artificial Intelligence, 40(32), 27423–27431. https://doi.org/10.1609/aaai.v40i32.39960

Issue

Section

AAAI Technical Track on Machine Learning IX