Hierarchical Image Generation via Transformer-Based Sequential Patch Selection

Xiaogang Xu; Ning Xu

doi:10.1609/aaai.v36i3.20199

Authors

Xiaogang Xu The Chinese University of Hong Kong
Ning Xu Adobe Research

DOI:

https://doi.org/10.1609/aaai.v36i3.20199

Keywords:

Computer Vision (CV), Machine Learning (ML)

Abstract

To synthesize images with preferred objects and interactions, a controllable way is to generate the image from a scene graph and a large pool of object crops, where the spatial arrangements of the objects in the image are defined by the scene graph while their appearances are determined by the retrieved crops from the pool. In this paper, we propose a novel framework with such a semi-parametric generation strategy. First, to encourage the retrieval of mutually compatible crops, we design a sequential selection strategy where the crop selection for each object is determined by the contents and locations of all object crops that have been chosen previously. Such process is implemented via a transformer trained with contrastive losses. Second, to generate the final image, our hierarchical generation strategy leverages hierarchical gated convolutions which are employed to synthesize areas not covered by any image crops, and a patch guided spatially adaptive normalization module which is proposed to guarantee the final generated images complying with the crop appearance and the scene graph. Evaluated on the challenging Visual Genome and COCO-Stuff dataset, our experimental results demonstrate the superiority of our proposed method over existing state-of-the-art methods.

Hierarchical Image Generation via Transformer-Based Sequential Patch Selection

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription