Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing

Authors

  • Shichao Ma University of Science and Technology of China
  • Yunhe Guo University of Science and Technology of China
  • Jiahao Su University of Science and Technology of China
  • Qihe Huang University of Science and Technology of China
  • Zhengyang Zhou University of Science and Technology of China
  • Yang Wang University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i38.40519

Abstract

Text-to-image generation tasks have driven remarkable advances in diverse media applications, yet most focus on single-turn scenarios and struggle with iterative, multi-turn creative tasks. Recent dialogue-based systems attempt to bridge this gap, but their single-agent, sequential paradigm often causes intention drift and incoherent edits. To address these limitations, we present Talk2Image, a novel multi-agent system for interactive image generation and editing in multi-turn dialogue scenarios. Our approach integrates three key components: intention parsing from dialogue history, task decomposition and collaborative execution across specialized agents, and feedback-driven refinement based on a multi-view evaluation mechanism. Talk2Image enables step-by-step alignment with user intention and consistent image editing. Experiments demonstrate that Talk2Image outperforms existing baselines in controllability, coherence, and user satisfaction across iterative image generation and editing tasks.

Downloads

Published

2026-03-14

How to Cite

Ma, S., Guo, Y., Su, J., Huang, Q., Zhou, Z., & Wang, Y. (2026). Talk2Image: A Multi-Agent System for Multi-Turn Image Generation and Editing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(38), 32437–32445. https://doi.org/10.1609/aaai.v40i38.40519

Issue

Section

AAAI Technical Track on Natural Language Processing III