HDLayout: Hierarchical and Directional Layout Planning for Arbitrary Shaped Visual Text Generation

Authors

  • Tonghui Feng School of Computer Science and Technology, Xidian University, China
  • Chunsheng Yan Guangzhou Institute of Technology, Xidian University, China
  • Qianru Wang School of Computer Science and Technology, Xidian University, China
  • Jiangtao Cui School of Computer Science and Technology, Xidian University, China
  • Xiaotian Qiao School of Computer Science and Technology, Xidian University, China Guangzhou Institute of Technology, Xidian University, China

DOI:

https://doi.org/10.1609/aaai.v39i3.32307

Abstract

Visual text generation, which aims to generate photo-realistic images with coherent and well-formed scene text being rendered, has attracted widespread attention. Although recent works have achieved promising performance, the limited flexibility and controllability hinder their practical applications. We observe that different from natural objects, visual text in real scenes often has an arbitrarily shaped structure with different granularities (i.e., character, word, or line). In this paper, we consider the modality gap between image and text, and propose a new separation and composition pipeline for flexible and controllable visual text generation from only text prompts. At the core of our framework is a novel Hierarchical and Directional Layout representation, i.e., HDLayout, which can model the sequential and multi-granularity nature of the visual text. Under this formulation, we are able to generate arbitrarily shaped visual text automatically. Extensive experiments demonstrate that our method outperforms several strong baselines in a variety of scenarios both qualitatively and quantitatively, yielding state-of-the-art performances on arbitrarily shaped visual text generation.

Published

2025-04-11

How to Cite

Feng, T., Yan, C., Wang, Q., Cui, J., & Qiao, X. (2025). HDLayout: Hierarchical and Directional Layout Planning for Arbitrary Shaped Visual Text Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 39(3), 2996–3003. https://doi.org/10.1609/aaai.v39i3.32307

Issue

Section

AAAI Technical Track on Computer Vision II