Scene Graph to Image Synthesis via Knowledge Consensus

Authors

  • Yang Wu School of Computer Science and Engineering, Sun Yat-sen University
  • Pengxu Wei School of Computer Science and Engineering, Sun Yat-sen University
  • Liang Lin School of Computer Science and Engineering, Sun Yat-sen University GuangDong Province Key Laboratory of Information Security Technology

DOI:

https://doi.org/10.1609/aaai.v37i3.25387

Keywords:

CV: Applications, CV: Computational Photography, Image & Video Synthesis, CV: Multi-modal Vision, CV: Visual Reasoning & Symbolic Representations, KRR: Knowledge Engineering

Abstract

In this paper, we study graph-to-image generation conditioned exclusively on scene graphs, in which we seek to disentangle the veiled semantics between knowledge graphs and images. While most existing research resorts to laborious auxiliary information such as object layouts or segmentation masks, it is also of interest to unveil the generality of the model with limited supervision, moreover, avoiding extra cross-modal alignments. To tackle this challenge, we delve into the causality of the adversarial generation process, and reason out a new principle to realize a simultaneous semantic disentanglement with an alignment on target and model distributions. This principle is named knowledge consensus, which explicitly describes a triangle causal dependency among observed images, graph semantics and hidden visual representations. The consensus also determines a new graph-to-image generation framework, carried on several adversarial optimization objectives. Extensive experimental results demonstrate that, even conditioned only on scene graphs, our model surprisingly achieves superior performance on semantics-aware image generation, without losing the competence on manipulating the generation through knowledge graphs.

Downloads

Published

2023-06-26

How to Cite

Wu, Y., Wei, P., & Lin, L. (2023). Scene Graph to Image Synthesis via Knowledge Consensus. Proceedings of the AAAI Conference on Artificial Intelligence, 37(3), 2856-2865. https://doi.org/10.1609/aaai.v37i3.25387

Issue

Section

AAAI Technical Track on Computer Vision III