Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis

Authors

  • Lianli Gao University of Electronic Science and Technology of China
  • Daiyuan Chen University of Electronic Science and Technology of China
  • Jingkuan Song University of Electronic Science and Technology of China
  • Xing Xu University of Electronic Science and Technology of China
  • Dongxiang Zhang University of Electronic Science and Technology of China
  • Heng Tao Shen University of Electronic Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v33i01.33018312

Abstract

Generating photo-realistic images conditioned on semantic text descriptions is a challenging task in computer vision field. Due to the nature of hierarchical representations learned in CNN, it is intuitive to utilize richer convolutional features to improve text-to-image synthesis. In this paper, we propose Perceptual Pyramid Adversarial Network (PPAN) to directly synthesize multi-scale images conditioned on texts in an adversarial way. Specifically, we design one pyramid generator and three independent discriminators to synthesize and regularize multi-scale photo-realistic images in one feed-forward process. At each pyramid level, our method takes coarse-resolution features as input, synthesizes highresolution images, and uses convolutions for up-sampling to finer level. Furthermore, the generator adopts the perceptual loss to enforce semantic similarity between the synthesized image and the ground truth, while a multi-purpose discriminator encourages semantic consistency, image fidelity and class invariance. Experimental results show that our PPAN sets new records for text-to-image synthesis on two benchmark datasets: CUB (i.e., 4.38 Inception Score and .290 Visual-semantic Similarity) and Oxford-102 (i.e., 3.52 Inception Score and .297 Visual-semantic Similarity).

Downloads

Published

2019-07-17

How to Cite

Gao, L., Chen, D., Song, J., Xu, X., Zhang, D., & Shen, H. T. (2019). Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 8312-8319. https://doi.org/10.1609/aaai.v33i01.33018312

Issue

Section

AAAI Technical Track: Vision