AnomalyMoE: Towards a Language-free Generalist Model for Unified Visual Anomaly Detection

Authors

  • Zhaopeng Gu Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Bingke Zhu Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Guibo Zhu Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Yingying Chen Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Wei Ge Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Ming Tang Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
  • Jinqiao Wang Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China Objecteye Inc., Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i6.42432

Abstract

Anomaly detection is a critical task across numerous domains and modalities, yet existing methods are often highly specialized, limiting their generalizability. These specialized models, tailored for specific anomaly types like textural defects or logical errors, typically exhibit limited performance when deployed outside their designated contexts. To overcome this limitation, we propose AnomalyMoE, a novel and universal anomaly detection framework based on a Mixture-of-Experts (MoE) architecture. Our key insight is to decompose the complex anomaly detection problem into three distinct semantic hierarchies: local structural anomalies, component-level semantic anomalies, and global logical anomalies. AnomalyMoE correspondingly employs three dedicated expert networks at the patch, component, and global levels, and is specialized in reconstructing features and identifying deviations at its designated semantic level. This hierarchical design allows a single model to concurrently understand and detect a wide spectrum of anomalies. Furthermore, we introduce an Expert Information Repulsion (EIR) module to promote expert diversity and an Expert Selection Balancing (ESB) module to ensure the comprehensive utilization of all experts. Experiments on 8 challenging datasets spanning industrial imaging, 3D point clouds, medical imaging, video surveillance, and logical anomaly detection demonstrate that AnomalyMoE establishes new state-of-the-art performance, significantly outperforming specialized methods in their respective domains.

Downloads

Published

2026-03-14

How to Cite

Gu, Z., Zhu, B., Zhu, G., Chen, Y., Ge, W., Tang, M., & Wang, J. (2026). AnomalyMoE: Towards a Language-free Generalist Model for Unified Visual Anomaly Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 40(6), 4348–4356. https://doi.org/10.1609/aaai.v40i6.42432

Issue

Section

AAAI Technical Track on Computer Vision III