L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions

Authors

  • Shuchen Weng School of Computer Science, Peking University
  • Hao Wu School of Software and Microelectronics, Peking University
  • Zheng Chang School of Artificial Intelligence, Beijing University of Posts and Telecommunications
  • Jiajun Tang School of Computer Science, Peking University
  • Si Li School of Artificial Intelligence, Beijing University of Posts and Telecommunications
  • Boxin Shi School of Computer Science, Peking University Institute for Artificial Intelligence, Peking University Beijing Academy of Artificial Intelligence Peng Cheng Laboratory

DOI:

https://doi.org/10.1609/aaai.v36i3.20170

Keywords:

Computer Vision (CV)

Abstract

Colorizing a grayscale image is inherently an ill-posed problem with multi-modal uncertainty. Language-based colorization offers a natural way of interaction to reduce such uncertainty via a user-provided caption. However, the color-object coupling and mismatch issues make the mapping from word to color difficult. In this paper, we propose L-CoDe, a Language-based Colorization network using color-object Decoupled conditions. A predictor for object-color corresponding matrix (OCCM) and a novel attention transfer module (ATM) are introduced to solve the color-object coupling problem. To deal with color-object mismatch that results in incorrect color-object correspondence, we adopt a soft-gated injection module (SIM). We further present a new dataset containing annotated color-object pairs to provide supervisory signals for resolving the coupling problem. Experimental results show that our approach outperforms state-of-the-art methods conditioned on captions.

Downloads

Published

2022-06-28

How to Cite

Weng, S., Wu, H., Chang, Z., Tang, J., Li, S., & Shi, B. (2022). L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions. Proceedings of the AAAI Conference on Artificial Intelligence, 36(3), 2677-2684. https://doi.org/10.1609/aaai.v36i3.20170

Issue

Section

AAAI Technical Track on Computer Vision III