Parameter Merging with Gradient-Guided Supermasks in Online Continual Learning

Authors

  • Benliu Qiu University of Electronic Science and Technology of China
  • Heqian Qiu University of Electronic Science and Technology of China
  • Lanxiao Wang University of Electronic Science and Technology of China
  • Taijin Zhao University of Electronic Science and Technology of China
  • Yu Dai University of Electronic Science and Technology of China
  • Lili Pan University of Electronic Science and Technology of China
  • Hongliang Li University of Electronic Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i30.39687

Abstract

Online continual learning (OCL) aims at learning a non-stationary data stream in a way of reading each data sample only once, and hence suffers from the trade-off of catastrophic forgetting and insufficient learning. In this work, we firstly analytically establish relationship between loss functions and model parameters from the Bayesian perspective. Based on our analysis, we subsequently propose a parameter merging method with gradient-guided supermasks. Our method leverages 1-order and 2-order gradient information to construct supermasks that determine the merging weights between the old and new models. Our method performs direct arithmetic operations on parameters to update models, beyond traditional gradient descent. We further discover that a widely-used premise that 1-order gradients can be negligible is invalid in OCL, due to slow convergence incurred by insufficient learning. Additionally, we utilize a dual-model dual-view distillation strategy that can align output distributions of the new and merged models for each sample, further enhancing model performance. Extensive experiments are conducted on four benchmarks in OCL settings, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet-100. Experimental results demonstrate that our method is effective, and achieves a substantial boost over previous methods.

Downloads

Published

2026-03-14

How to Cite

Qiu, B., Qiu, H., Wang, L., Zhao, T., Dai, Y., Pan, L., & Li, H. (2026). Parameter Merging with Gradient-Guided Supermasks in Online Continual Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(30), 24991–24999. https://doi.org/10.1609/aaai.v40i30.39687

Issue

Section

AAAI Technical Track on Machine Learning VII