Exploring Position Encoding Mechanism in Diffusion U-Net for Training-free High-resolution Image Generation

Authors

  • Feng Zhou Beijing University of Posts and Telecommunications
  • Pu Cao Beijing University of Posts and Telecommunications Beijing Hydrogen Intelligence Technology Co. Ltd.
  • Yiyang Ma Beijing University of Posts and Telecommunications
  • Lu Yang Beijing University of Posts and Telecommunications
  • Yonghao Dang Beijing University of Posts and Telecommunications
  • Jianqin Yin Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v40i16.38366

Abstract

Denoising higher-resolution latents using a pre-trained U-Net often results in repetitive and disordered image patterns. In this work, we are motivated to reveal the intrinsic cause of such pattern disruption in high-resolution image generation. Through theoretical analysis and empirical studies, we reveal that the pre-trained U-Net fails to provide sufficient positional information for tokens at high-resolution. Specifically, 1) zero-padding serves as a critical mechanism for position encoding but lacks robustness across varying resolutions; and 2) tokens located farther from the feature map boundaries have increasing difficulty acquiring positional awareness, leading to pattern disruptions. Inspired by these findings, we propose a novel training-free approach for high-resolution generation, introducing a Progressive Boundary Complement (PBC) method. It creates dynamic virtual image boundaries inside the feature map to supplement position information at high resolution, enabling high-quality and rich-content high-resolution image synthesis. Extensive experiments show that our method significantly improves high-resolution image synthesis in terms of visual quality and content richness, achieving state-of-the-art performance.

Published

2026-03-14

How to Cite

Zhou, F., Cao, P., Ma, Y., Yang, L., Dang, Y., & Yin, J. (2026). Exploring Position Encoding Mechanism in Diffusion U-Net for Training-free High-resolution Image Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(16), 13602–13610. https://doi.org/10.1609/aaai.v40i16.38366

Issue

Section

AAAI Technical Track on Computer Vision XIII