Scene Graph to Image Synthesis via Knowledge Consensus
DOI:
https://doi.org/10.1609/aaai.v37i3.25387Keywords:
CV: Applications, CV: Computational Photography, Image & Video Synthesis, CV: Multi-modal Vision, CV: Visual Reasoning & Symbolic Representations, KRR: Knowledge EngineeringAbstract
In this paper, we study graph-to-image generation conditioned exclusively on scene graphs, in which we seek to disentangle the veiled semantics between knowledge graphs and images. While most existing research resorts to laborious auxiliary information such as object layouts or segmentation masks, it is also of interest to unveil the generality of the model with limited supervision, moreover, avoiding extra cross-modal alignments. To tackle this challenge, we delve into the causality of the adversarial generation process, and reason out a new principle to realize a simultaneous semantic disentanglement with an alignment on target and model distributions. This principle is named knowledge consensus, which explicitly describes a triangle causal dependency among observed images, graph semantics and hidden visual representations. The consensus also determines a new graph-to-image generation framework, carried on several adversarial optimization objectives. Extensive experimental results demonstrate that, even conditioned only on scene graphs, our model surprisingly achieves superior performance on semantics-aware image generation, without losing the competence on manipulating the generation through knowledge graphs.Downloads
Published
2023-06-26
How to Cite
Wu, Y., Wei, P., & Lin, L. (2023). Scene Graph to Image Synthesis via Knowledge Consensus. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2856-2865. https://doi.org/10.1609/aaai.v37i3.25387
Issue
Section
AAAI Technical Track on Computer Vision III