Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation

Authors

  • Rongyu Zhang Nanjing University The Hong Kong Polytechnic University State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
  • Aosong Cheng State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
  • Yulin Luo State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
  • Gaole Dai State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
  • Huanrui Yang University of Arizona
  • Jiaming Liu State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
  • Ran Xu State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
  • Li Du Nanjing University
  • Dan Wang Hong Kong University of Science and Technology
  • Yuan Du Nanjing University

DOI:

https://doi.org/10.1609/aaai.v40i42.40922

Abstract

Continual Test-Time Adaptation (CTTA), which aims to adapt the pre-trained model to ever-evolving target domains, emerges as an important task for vision models. As current vision models appear to be heavily biased towards texture, continuously adapting the model from one domain distribution to another can result in serious catastrophic forgetting. Drawing inspiration from the the encoding characteristics of neuron activation in neural networks, we propose the Mixture-of-Activation-Sparsity-Experts (MoASE) for the CTTA task. Given the distinct reaction of neurons with low and high activation to domain-specific and agnostic features, MoASE decomposes the neural activation into high-activation and low-activation components in each expert with a Spatial Differentiable Dropout (SDD). Based on the decomposition, we devise a Domain-Aware Router (DAR) that utilizes domain information to adaptively weight experts that process the post-SDD sparse activations, and the Activation Sparsity Gate (ASG) that adaptively assigns feature selection thresholds of the SDD for different experts for more precise feature decomposition. Finally, we introduce a Homeostatic-Proximal (HP) loss to maintain update consistency between the teacher and student experts to prevent error accumulation. Extensive experiments substantiate that MoASE achieves state-of-the-art performance in both classification and segmentation tasks.

Published

2026-03-14

How to Cite

Zhang, R., Cheng, A., Luo, Y., Dai, G., Yang, H., Liu, J., … Du, Y. (2026). Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 36057–36065. https://doi.org/10.1609/aaai.v40i42.40922

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI