GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
DOI:
https://doi.org/10.1609/aaai.v39i6.32636Abstract
Posters serve an essential function in marketing and advertising by improving visual communication and brand visibility, thus significantly contributing to industrial design. With the latest developments in controllable T2I diffusion models, research interest has surged in text rendering within synthesized images. Although text rendering accuracy has seen advancements, automatic poster generation remains a relatively untapped area. This paper presents an automatic poster generation framework featuring text rendering capabilities through the use of LLMs. Our framework employs a triple-cross attention mechanism based on alignment learning to achieve precise text placement within detailed contextual backgrounds. Moreover, it supports adjustable fonts, varying image resolutions, and poster rendering with textual prompts in both English and Chinese. Additionally, we present a comprehensive bilingual image-text dataset, GlyphDraw-3M, comprising 3 million image-text pairs, each with OCR annotations and resolutions exceeding 1024. Our method utilizes the SDXL architecture, and extensive experiments confirm its ability to generate posters with intricate and context-rich backgrounds.Published
2025-04-11
How to Cite
Ma, J., Deng, Y., Chen, C., Du, N., Lu, H., & Yang, Z. (2025). GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 39(6), 5955–5963. https://doi.org/10.1609/aaai.v39i6.32636
Issue
Section
AAAI Technical Track on Computer Vision V