MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

Xiao Fan; Jingyan Jiang; Zhaoru Chen; Fanding Huang; Xiao Chen; Qinting Jiang; Bowen Zhang; Xing Tang; Zhi Wang

doi:10.1609/aaai.v40i25.39243

Authors

Xiao Fan College of Computer Science and Technology, Tongji University, Shanghai, China Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Jingyan Jiang School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
Zhaoru Chen School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
Fanding Huang Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Xiao Chen Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Qinting Jiang Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
Bowen Zhang School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
Xing Tang School of Artificial Intelligence, Shenzhen Technology University, Shenzhen, China
Zhi Wang Shenzhen International Graduate School, Tsinghua University, Shenzhen, China

DOI:

https://doi.org/10.1609/aaai.v40i25.39243

Abstract

Test-time adaptation (TTA) has proven effective in mitigating performance drops under single-domain distribution shifts by updating model parameters during inference. However, real-world deployments often involve mixed distribution shifts---where test samples are affected by diverse and potentially conflicting domain factors---posing significant challenges even for state-of-the-art TTA methods. A key limitation in existing approaches is their reliance on a unified adaptation path, which fails to account for the fact that optimal gradient directions can vary significantly across different domains. Moreover, current benchmarks focus only on synthetic or homogeneous shifts, failing to capture the complexity of real-world heterogeneous mixed distribution shifts. To address this, we propose MoETTA, a novel entropy-based TTA framework that integrates the Mixture-of-Experts (MoE) architecture. Rather than enforcing a single parameter update rule for all test samples, MoETTA introduces a set of structurally decoupled experts, enabling specialization along diverse gradient directions. This design allows the model to better accommodate heterogeneous shifts through flexible and disentangled parameter updates. To simulate realistic deployment conditions, we introduce two new benchmarks: potpourri and potpourri+. While classical settings focus solely on synthetic corruptions (i.e., ImageNet-C), potpourri encompasses a broader range of domain shifts—including natural, artistic, and adversarial distortions—capturing more realistic deployment challenges. On top of that, potpourri+ further includes source-domain samples to evaluate robustness against catastrophic forgetting. Extensive experiments across three mixed distribution shifts settings show that MoETTA consistently outperforms strong baselines, establishing new state-of-the-art performance and highlighting the benefit of modeling multiple adaptation directions via expert-level diversity.

MoETTA: Test-Time Adaptation Under Mixed Distribution Shifts with MoE-LayerNorm

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information