CHIMERA: Controllable High-quality Image-Mask Extraction for Reliable Diffusion-based Anomaly Synthesis

Authors

  • JoungBin Lee Korea Advanced Institute of Science & Technology
  • Hyunkoo Lee Korea Advanced Institute of Science & Technology
  • Jini Yang Korea Advanced Institute of Science & Technology
  • Chaehyun Kim Korea Advanced Institute of Science & Technology
  • Jung Yi Korea Advanced Institute of Science & Technology
  • Seok Hwangbo Samsung Disaply
  • Hyeoncheol Lee Samsung Disaply
  • Minho Chun Samsung Disaply
  • Eunjo Jeong Samsung Disaply
  • Seungryong Kim Korea Advanced Institute of Science & Technology

DOI:

https://doi.org/10.1609/aaai.v40i7.37511

Abstract

We present CHIMERA, a novel framework for generating realistic, generalizable, and prompt-driven industrial anomalies from natural language instructions. Our method addresses two key challenges in text-guided anomaly synthesis: (1) the scarcity of scalable, high-quality paired anomaly data and (2) the difficulty of efficiently adapting large diffusion models to domain-specific tasks without overfitting. To tackle these challenges, we first introduce a Vision-Language Model (VLM)-guided data curation pipeline that automatically generates semantically rich and spatially grounded captions from normal images, enabling effective dataset augmentation without manual annotations. Building upon this, we propose a parameter-efficient fine-tuning strategy that adapts a pre-trained Diffusion Transformer (Stable Diffusion 3) using lightweight LoRA adapters. By aligning structured prompts with the model's pre-trained language-vision prior and introducing auxiliary attention-based mask supervision, our method prevents overfitting, enhances spatial consistency, and ensures efficient training even with limited data. Extensive experiments show that CHIMERA is the first unified framework to achieve controllable, scalable, and generalizable industrial anomaly generation by integrating VLM-guided data curation with efficient diffusion-based training, significantly improving anomaly detection in low-data and unseen scenarios.

Downloads

Published

2026-03-14

How to Cite

Lee, J., Lee, H., Yang, J., Kim, C., Yi, J., Hwangbo, S., … Kim, S. (2026). CHIMERA: Controllable High-quality Image-Mask Extraction for Reliable Diffusion-based Anomaly Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5890–5898. https://doi.org/10.1609/aaai.v40i7.37511

Issue

Section

AAAI Technical Track on Computer Vision IV