SCORE: Semantic Collage by Optimizing Rendered Elements

Zefan Shao; Jin Zhou; Hongliang Yang; Pengfei Xu

doi:10.1609/aaai.v40i3.37185

Authors

Zefan Shao Shenzhen University
Jin Zhou Shenzhen University
Hongliang Yang Shenzhen University
Pengfei Xu Shenzhen University

DOI:

https://doi.org/10.1609/aaai.v40i3.37185

Abstract

Collage is a powerful medium for visual expression, traditionally demanding significant artistic expertise and manual effort. Existing methods often struggle with a trade-off between semantic expression and the visual fidelity of the constituent images. To address this, we introduce SCORE (Semantic Collage by Optimizing Rendered Elements), a novel text-driven framework that automates the creation of semantically rich and structurally sound collages. Our key innovation is to shift the optimization process entirely into the image space. By employing a differentiable renderer, we can backpropagate gradients from a powerful, pre-trained text-to-image model directly to the spatial parameters, including position, rotation, and scale, of each image element. We leverage Variational Score Distillation (VSD) to provide robust semantic guidance from a text prompt, ensuring the final layout aligns with the desired concept. Crucially, our ''minimal editing'' principle preserves the integrity of the original elements by forgoing any content-level modifications. The layout is refined by a joint loss function that combines the VSD-based semantic loss with structural regularizers that penalize overlap and enforce boundary constraints. The output of SCORE is a parametric, structured representation that allows further editing and downstream use. Our work reduces the barrier to creative expression and provides a new, powerful paradigm for organizing visual contents.

SCORE: Semantic Collage by Optimizing Rendered Elements

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information