Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation

Authors

  • Xie Tianyidan School of Intelligence Science and Technology, Nanjing University, Suzhou, China
  • Rui Ma Jilin University, Changchun, China
  • Qian Wang China Mobile Research Institute, Beijing, China
  • Xiaoqian Ye China Mobile Research Institute, Beijing, China
  • Feixuan Liu Beijing Shuzhimei Technology Co., Ltd, Beijing, China
  • Ying Tai State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Intelligence Science and Technology, Nanjing University, Suzhou, China
  • Zhenyu Zhang State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Intelligence Science and Technology, Nanjing University, Suzhou, China
  • Lanjun Wang School of New Media and Communication, Tianjin University, Tianjin, China
  • Zili Yi State Key Laboratory of Novel Software Technology, Nanjing University, Nanjing, China School of Intelligence Science and Technology, Nanjing University, Suzhou, China

DOI:

https://doi.org/10.1609/aaai.v39i7.32797

Abstract

Recent advancements in image-conditioned image generation have demonstrated substantial progress. However, foreground-conditioned image generation remains underexplored, encountering challenges such as compromised object integrity, foreground-background inconsistencies, limited diversity, and reduced control flexibility. These challenges arise from current end-to-end inpainting models, which suffer from inaccurate training masks, limited foreground semantic understanding, data distribution biases, and inherent interference between visual and textual prompts. To overcome these limitations, we present Anywhere, a multi-agent framework that departs from the traditional end-to-end approach. In this framework, each agent is specialized in a distinct aspect, such as foreground understanding, diversity enhancement, object integrity protection, and textual prompt consistency. Our framework is further enhanced with the ability to incorporate optional user textual inputs, perform automated quality assessments, and initiate re-generation as needed. Comprehensive experiments demonstrate that this modular design effectively overcomes the limitations of existing end-to-end models, resulting in higher fidelity, quality, diversity and controllability in foreground-conditioned image generation. Additionally, the Anywhere framework is extensible, allowing it to benefit from future advancements in each individual agent.

Downloads

Published

2025-04-11

How to Cite

Tianyidan, X., Ma, R., Wang, Q., Ye, X., Liu, F., Tai, Y., … Yi, Z. (2025). Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 7410–7418. https://doi.org/10.1609/aaai.v39i7.32797

Issue

Section

AAAI Technical Track on Computer Vision VI