Parameter Merging with Gradient-Guided Supermasks in Online Continual Learning

Benliu Qiu; Heqian Qiu; Lanxiao Wang; Taijin Zhao; Yu Dai; Lili Pan; Hongliang Li

doi:10.1609/aaai.v40i30.39687

Authors

Benliu Qiu University of Electronic Science and Technology of China
Heqian Qiu University of Electronic Science and Technology of China
Lanxiao Wang University of Electronic Science and Technology of China
Taijin Zhao University of Electronic Science and Technology of China
Yu Dai University of Electronic Science and Technology of China
Lili Pan University of Electronic Science and Technology of China
Hongliang Li University of Electronic Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i30.39687

Abstract

Online continual learning (OCL) aims at learning a non-stationary data stream in a way of reading each data sample only once, and hence suffers from the trade-off of catastrophic forgetting and insufficient learning. In this work, we firstly analytically establish relationship between loss functions and model parameters from the Bayesian perspective. Based on our analysis, we subsequently propose a parameter merging method with gradient-guided supermasks. Our method leverages 1-order and 2-order gradient information to construct supermasks that determine the merging weights between the old and new models. Our method performs direct arithmetic operations on parameters to update models, beyond traditional gradient descent. We further discover that a widely-used premise that 1-order gradients can be negligible is invalid in OCL, due to slow convergence incurred by insufficient learning. Additionally, we utilize a dual-model dual-view distillation strategy that can align output distributions of the new and merged models for each sample, further enhancing model performance. Extensive experiments are conducted on four benchmarks in OCL settings, including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet-100. Experimental results demonstrate that our method is effective, and achieves a substantial boost over previous methods.

Parameter Merging with Gradient-Guided Supermasks in Online Continual Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information