ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Shuwei Shi; Wenbo Li; Yuechen Zhang; Jingwen He; Biao Gong; Yinqiang Zheng

doi:10.1609/aaai.v39i7.32739

Authors

Shuwei Shi The University of Tokyo
Wenbo Li The Chinese University of Hong Kong
Yuechen Zhang The Chinese University of Hong Kong
Jingwen He The Chinese University of Hong Kong
Biao Gong Ant Group
Yinqiang Zheng The University of Tokyo

DOI:

https://doi.org/10.1609/aaai.v39i7.32739

Abstract

Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in structural distortions, and repetitive patterns. To this end, we introduce ResMaster, a novel, training-free method that empowers resolution-limited diffusion models to generate high-quality images beyond resolution restrictions. Specifically, ResMaster leverages a low-resolution reference image created by a pre-trained diffusion model to provide structural and fine-grained guidance for crafting high-resolution images on a patch-by-patch basis. To ensure a coherent structure, ResMaster meticulously aligns the low-frequency components of high-resolution patches with the low-resolution reference at each denoising step. For fine-grained guidance, tailored image prompts based on the low-resolution reference and enriched textual prompts produced by a vision-language model are incorporated. This approach could significantly mitigate local pattern distortions and improve detail refinement. Extensive experiments validate that ResMaster sets a new benchmark for high-resolution image generation.

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information