Inpaint-Anywhere: Zero-Shot Multi-Identity Inpainting with Efficient Diffusion Transformer

Authors

  • Junsheng Luan Zhejiang University
  • Lei Zhao Zhejiang University
  • Wei Xing Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i9.37706

Abstract

Subject-driven generation, which aims to synthesize visual content for a given identity V* with specific attributes, has garnered increasing attention in recent years. While existing methods demonstrate impressive identity consistency for both single and multiple identities, they often lack user-specified spatial control. Recent approaches, such as OminiControl-2 and EasyControl, enable inpainting conditioned on a single identity but fall short in multi-identity scenarios. In this paper, we introduce BoundID, a dataset synthesis pipeline for generating multi-identity images with bounding box annotations, and introduce Inpaint-Anywhere, a diffusion transformer framework for multi-identity inpainting. Given multiple identity references and corresponding masks, our method simultaneously generates all desired identities at precise locations while achieving both high identity and prompt fidelity. Extensive experiments show that Inpaint-Anywhere achieves state-of-the-art performance in multi-identity inpainting.

Downloads

Published

2026-03-14

How to Cite

Luan, J., Zhao, L., & Xing, W. (2026). Inpaint-Anywhere: Zero-Shot Multi-Identity Inpainting with Efficient Diffusion Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 40(9), 7644–7652. https://doi.org/10.1609/aaai.v40i9.37706

Issue

Section

AAAI Technical Track on Computer Vision VI