RealUHR: Harnessing Patch-Cascade Flows for Photorealistic Ultra-High-Resolution Synthesis
DOI:
https://doi.org/10.1609/aaai.v40i14.38211Abstract
Ultra-high-resolution (UHR) text-to-image synthesis faces significant hurdles, including immense computational costs and a scarcity of training data. To address these, we introduce RealUHR, an efficient and scalable framework for generating photorealistic 4K images. At its core, RealUHR employs a Patch-Cascade Flow Matching pipeline that ensures global coherence without costly patch fusion by initiating generation from a semantically meaningful structure. This enables highly efficient, few-step inference for independent patches. Our key contribution is Guidance-Consistent Adaptation (GCA), a novel two-stage strategy to resolve the fundamental objective mismatch in guidance-distilled models. GCA allows powerful backbones like FLUX to be effectively adapted for patch-aware UHR synthesis. The framework's detail-rendering capabilities are further enhanced by a non-uniform time schedule. Experiments show that RealUHR establishes superior performance in both quality and efficiency, and excels in zero-shot applications such as creative up-sampling and generative artifact suppression.Downloads
Published
2026-03-14
How to Cite
Yu, Y., Zheng, H., Lin, Z., Barnes, C., Zhou, Y., Zhang, Z., & Luo, J. (2026). RealUHR: Harnessing Patch-Cascade Flows for Photorealistic Ultra-High-Resolution Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 40(14), 12204-12212. https://doi.org/10.1609/aaai.v40i14.38211
Issue
Section
AAAI Technical Track on Computer Vision XI