Multimodal Structure-Consistent Image-to-Image Translation


  • Che-Tsung Lin National Tsing Hua University
  • Yen-Yi Wu National Tsing Hua University
  • Po-Hao Hsu National Tsing Hua University
  • Shang-Hong Lai National Tsing Hua University



Unpaired image-to-image translation is proven quite effective in boosting a CNN-based object detector for a different domain by means of data augmentation that can well preserve the image-objects in the translated images. Recently, multimodal GAN (Generative Adversarial Network) models have been proposed and were expected to further boost the detector accuracy by generating a diverse collection of images in the target domain, given only a single/labelled image in the source domain. However, images generated by multimodal GANs would achieve even worse detection accuracy than the ones by a unimodal GAN with better object preservation. In this work, we introduce cycle-structure consistency for generating diverse and structure-preserved translated images across complex domains, such as between day and night, for object detector training. Qualitative results show that our model, Multimodal AugGAN, can generate diverse and realistic images for the target domain. For quantitative comparisons, we evaluate other competing methods and ours by using the generated images to train YOLO, Faster R-CNN and FCN models and prove that our model achieves significant improvement and outperforms other methods on the detection accuracies and the FCN scores. Also, we demonstrate that our model could provide more diverse object appearances in the target domain through comparison on the perceptual distance metric.




How to Cite

Lin, C.-T., Wu, Y.-Y., Hsu, P.-H., & Lai, S.-H. (2020). Multimodal Structure-Consistent Image-to-Image Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07), 11490-11498.



AAAI Technical Track: Vision