MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

Authors

  • Xiao Fan College of Computer Science and Technology, Tongji University, Shanghai, China Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
  • Jingyan Jiang School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
  • Zhaoru Chen School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
  • Fanding Huang Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
  • Xiao Chen Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
  • Qinting Jiang Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
  • Bowen Zhang School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
  • Xing Tang School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
  • Zhi Wang Shenzhen International Graduate School, Tsinghua University, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v40i25.39243

Abstract

Test-time adaptation (TTA) has proven effective in mitigating performance drops under single-domain distribution shifts by updating model parameters during inference. However, real-world deployments often involve mixed distribution shifts---where test samples are affected by diverse and potentially conflicting domain factors---posing significant challenges even for state-of-the-art TTA methods. A key limitation in existing approaches is their reliance on a unified adaptation path, which fails to account for the fact that optimal gradient directions can vary significantly across different domains. Moreover, current benchmarks focus only on synthetic or homogeneous shifts, failing to capture the complexity of real-world heterogeneous mixed distribution shifts. To address this, we propose MoETTA, a novel entropy-based TTA framework that integrates the Mixture-of-Experts (MoE) architecture. Rather than enforcing a single parameter update rule for all test samples, MoETTA introduces a set of structurally decoupled experts, enabling specialization along diverse gradient directions. This design allows the model to better accommodate heterogeneous shifts through flexible and disentangled parameter updates. To simulate realistic deployment conditions, we introduce two new benchmarks: potpourri and potpourri+. While classical settings focus solely on synthetic corruptions (i.e., ImageNet-C), potpourri encompasses a broader range of domain shifts—including natural, artistic, and adversarial distortions—capturing more realistic deployment challenges. On top of that, potpourri+ further includes source-domain samples to evaluate robustness against catastrophic forgetting. Extensive experiments across three mixed distribution shifts settings show that MoETTA consistently outperforms strong baselines, establishing new state-of-the-art performance and highlighting the benefit of modeling multiple adaptation directions via expert-level diversity.

Published

2026-03-14

How to Cite

Fan, X., Jiang, J., Chen, Z., Huang, F., Chen, X., Jiang, Q., … Wang, Z. (2026). MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm. Proceedings of the AAAI Conference on Artificial Intelligence, 40(25), 21011–21019. https://doi.org/10.1609/aaai.v40i25.39243

Issue

Section

AAAI Technical Track on Machine Learning II