Rep Deep & Machine Learning: Exemplar-Free Continual Video Action Recognition via Slow-Fast Collaborative Learning

Xueyi Zhang; Chengwei Zhang; Zheng Li; Xiyu Wang; Siqi Cai; Mingrui Lao; Yanming Guo; Huiping Zhuang

doi:10.1609/aaai.v40i42.40924

Authors

Xueyi Zhang National University of Defense Technology Shenzhen Loop Area Institute The Chinese University of Hong Kong, Shenzhen
Chengwei Zhang University of the Chinese Academy of Sciences
Zheng Li National University of Defense Technology
Xiyu Wang The Chinese University of Hong Kong, Shenzhen
Siqi Cai Harbin Institute of Technology, Shenzhen Shenzhen Loop Area Institute
Mingrui Lao National University of Defense Technology
Yanming Guo National University of Defense Technology
Huiping Zhuang South China University of Technology

DOI:

https://doi.org/10.1609/aaai.v40i42.40924

Abstract

In real-world applications, video action recognition models must continuously learn new action categories while retaining previously acquired knowledge. However, most existing approaches rely on storing historical data for replay, which introduces storage burdens and raises data privacy concerns. To address these challenges, we investigate the problem of Exemplar-Free Continual Video Action Recognition (EF-CVAR) and propose a novel framework named Slow-Fast Collaborative Learning (SFCL). SFCL integrates two complementary learning paradigms: a slow branch based on gradient-driven deep learning, which provides strong adaptability to new tasks, and a fast branch based on analytic learning (e.g., Recursive Least Squares), which efficiently preserves old knowledge without requiring access to past samples. To enable effective collaboration between the two branches, we design the Slow-Fast Dynamic Re-parameterization (SFDR) mechanism for adaptive fusion, and the Knowledge Reflection Mechanism (KRM), which mitigates forgetting and task-recency bias via pseudo-feature generation and dual-level knowledge distillation. Extensive experiments on UCF101, HMDB51, and Something-Something V2 demonstrate that SFCL achieves superior performance compared to existing replay-based methods, despite being exemplar-free. Notably, in long-duration continual learning scenarios, SFCL exhibits remarkable robustness, achieving up to a 30.39\% improvement in accuracy over baselines while maintaining a low forgetting rate, highlighting its scalability and effectiveness in real-world video recognition tasks.

Rep Deep & Machine Learning: Exemplar-Free Continual Video Action Recognition via Slow-Fast Collaborative Learning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information