DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control

Authors

  • Shiyan Du Sun Yat-sen University, Guangzhou, China
  • Conghan Yue Fudan University, Shanghai, China
  • Xinyu Cheng Yale University, New Haven, United States
  • Dongyu Zhang Sun Yat-sen University, Guangzhou, China

DOI:

https://doi.org/10.1609/aaai.v40i5.37367

Abstract

Multi-Instance Generation has advanced significantly in spatial placement and attribute binding. However, existing approaches still face challenges in fine-grained semantic understanding, particularly when dealing with complex textual descriptions.To overcome these limitations, we propose DEIG, a novel framework for fine-grained and controllable multi-instance generation. DEIG integrates an instance Detail Extractor (IDE) that transforms text encoder embeddings into compact, instance-aware representations, and a Detail Fusion Module (DFM) that applies instance-based masked attention to prevent attribute leakage across instances. These components enable DEIG to generate visually coherent multi-instance scenes that precisely match rich, localized textual descriptions. To support fine-grained supervision, we construct a high-quality dataset with detailed, compositional instance captions generated by VLMs. We also introduce DEIG-Bench, a new benchmark with region-level annotations and multi-attribute prompts for both humans and objects.Experiments demonstrate that DEIG consistently outperforms existing approaches across multiple benchmarks in spatial consistency, semantic accuracy, and compositional generalization. Moreover, DEIG functions as a plug-and-play module, making it easily integrable into standard diffusion-based pipelines.

Published

2026-03-14

How to Cite

Du, S., Yue, C., Cheng, X., & Zhang, D. (2026). DEIG: Detail-Enhanced Instance Generation with Fine-Grained Semantic Control. Proceedings of the AAAI Conference on Artificial Intelligence, 40(5), 3677-3685. https://doi.org/10.1609/aaai.v40i5.37367

Issue

Section

AAAI Technical Track on Computer Vision II