FreLay: Frequency-aware Energy Function for Training-free Layout-to-Image Generation

Authors

  • Bonan Li University of the Chinese Academy of Sciences National University of Singapore
  • Yinhan Hu University of the Chinese Academy of Sciences
  • Songhua Liu National University of Singapore
  • Zeyu Xiao National University of Singapore
  • Xinchao Wang National University of Singapore

DOI:

https://doi.org/10.1609/aaai.v40i8.37522

Abstract

Layout-to-Image generation has significantly advanced content creation by enabling the rendering of visual text under predefined spatial layouts. Current approaches achieve training-free layout guidance by constructing attention-based energy functions to derive correction gradients. In this paper, we demonstrate that vanilla energy functions suffer from two limitations, resulting in imprecise layout control and visually unrealistic artifacts. First, the normalizing factor of the Boltzmann distribution defined by the energy functions is non-negligible when calculating correction gradients, yet current energy functions cannot compute this factor exactly. Furthermore, while attention varies over time during the denoising process, existing approaches employ a fixed formulation. To address these challenges, we introduce FreLay, a novel training-free approach equipped with a frequency-aware energy function. Our method first reformulates the energy function to handle the normalization factor, enabling accurate computation of correction gradients. Simultaneously, leveraging the prior knowledge that low-frequency information deteriorates slower during noise addition, we design a time-specific energy function for each timestep from a frequency-domain perspective. Experimental results demonstrate that FreLay consistently outperforms existing state-of-the-art training-free methods by a large margin both qualitatively and quantitatively across multiple datasets.

Downloads

Published

2026-03-14

How to Cite

Li, B., Hu, Y., Liu, S., Xiao, Z., & Wang, X. (2026). FreLay: Frequency-aware Energy Function for Training-free Layout-to-Image Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(8), 5992–6000. https://doi.org/10.1609/aaai.v40i8.37522

Issue

Section

AAAI Technical Track on Computer Vision V