ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Authors

  • Shuwei Shi The University of Tokyo
  • Wenbo Li The Chinese University of Hong Kong
  • Yuechen Zhang The Chinese University of Hong Kong
  • Jingwen He The Chinese University of Hong Kong
  • Biao Gong Ant Group
  • Yinqiang Zheng The University of Tokyo

DOI:

https://doi.org/10.1609/aaai.v39i7.32739

Abstract

Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMaster leverages a low-resolution reference image created by a pre-trained diffusion model to provide structural and fine-grained guidance for crafting high-resolution images on a patch-by-patch basis. To ensure a coherent structure, ResMaster meticulously aligns the low-frequency components of high-resolution patches with the low-resolution reference at each denoising step. For fine-grained guidance, tailored image prompts based on the low-resolution reference and enriched textual prompts produced by a vision-language model are incorporated. This approach could significantly mitigate local pattern distortions and improve detail refinement. Extensive experiments validate that ResMaster sets a new benchmark for high-resolution image generation.

Published

2025-04-11

How to Cite

Shi, S., Li, W., Zhang, Y., He, J., Gong, B., & Zheng, Y. (2025). ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance. Proceedings of the AAAI Conference on Artificial Intelligence, 39(7), 6887–6895. https://doi.org/10.1609/aaai.v39i7.32739

Issue

Section

AAAI Technical Track on Computer Vision VI