Multimodal Structure-Consistent Image-to-Image Translation

Che-Tsung Lin; Yen-Yi Wu; Po-Hao Hsu; Shang-Hong Lai

doi:10.1609/aaai.v34i07.6814

Authors

Che-Tsung Lin National Tsing Hua University
Yen-Yi Wu National Tsing Hua University
Po-Hao Hsu National Tsing Hua University
Shang-Hong Lai National Tsing Hua University

DOI:

https://doi.org/10.1609/aaai.v34i07.6814

Abstract

Unpaired image-to-image translation is proven quite effective in boosting a CNN-based object detector for a different domain by means of data augmentation that can well preserve the image-objects in the translated images. Recently, multimodal GAN (Generative Adversarial Network) models have been proposed and were expected to further boost the detector accuracy by generating a diverse collection of images in the target domain, given only a single/labelled image in the source domain. However, images generated by multimodal GANs would achieve even worse detection accuracy than the ones by a unimodal GAN with better object preservation. In this work, we introduce cycle-structure consistency for generating diverse and structure-preserved translated images across complex domains, such as between day and night, for object detector training. Qualitative results show that our model, Multimodal AugGAN, can generate diverse and realistic images for the target domain. For quantitative comparisons, we evaluate other competing methods and ours by using the generated images to train YOLO, Faster R-CNN and FCN models and prove that our model achieves significant improvement and outperforms other methods on the detection accuracies and the FCN scores. Also, we demonstrate that our model could provide more diverse object appearances in the target domain through comparison on the perceptual distance metric.

Multimodal Structure-Consistent Image-to-Image Translation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription