Task-Free Dynamic Sparse Vision Transformer for Continual Learning

Authors

  • Fei Ye University of York Mohamed bin Zayed University of Artificial Intelligence
  • Adrian G. Bors University of York Mohamed bin Zayed University of Artificial Intelligence

DOI:

https://doi.org/10.1609/aaai.v38i15.29581

Keywords:

ML: Life-Long and Continual Learning, ML: Deep Generative Models & Autoencoders, ML: Ensemble Methods, ML: Scalability of ML Systems

Abstract

Vision Transformers (ViTs) represent self-attention-based network backbones shown to be efficient in many individual tasks, but which have not been explored in Task-Free Continual Learning (TFCL) so far. Most existing ViT-based approaches for Continual Learning (CL) are relying on task information. In this study, we explore the advantages of the ViT in a more challenging CL scenario where the task boundaries are unavailable during training. To address this learning paradigm, we propose the Task-Free Dynamic Sparse Vision Transformer (TFDSViT), which can dynamically build new sparse experts, where each expert leverages sparsity to allocate the model's capacity for capturing different information categories over time. To avoid forgetting and ensure efficiency in reusing the previously learned knowledge in subsequent learning, we propose a new dynamic dual attention mechanism consisting of the Sparse Attention (SA') and Knowledge Transfer Attention (KTA) modules. The SA' refrains from updating some previously learned attention blocks for preserving prior knowledge. The KTA uses and regulates the information flow of all previously learned experts for learning new patterns. The proposed dual attention mechanism can simultaneously relieve forgetting and promote knowledge transfer for a dynamic expansion model in a task-free manner. We also propose an energy-based dynamic expansion mechanism using the energy as a measure of novelty for the incoming samples which provides appropriate expansion signals leading to a compact network architecture for TFDSViT. Extensive empirical studies demonstrate the effectiveness of TFDSViT. The code and supplementary material (SM) are available at https://github.com/dtuzi123/TFDSViT.

Published

2024-03-24

How to Cite

Ye, F., & Bors, A. G. (2024). Task-Free Dynamic Sparse Vision Transformer for Continual Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(15), 16442-16450. https://doi.org/10.1609/aaai.v38i15.29581

Issue

Section

AAAI Technical Track on Machine Learning VI