Image Content Generation with Causal Reasoning

Authors

  • Xiaochuan Li Inspur Electronic Information Industry Co.,Ltd. Shandong Massive Information Technology Research Institute
  • Baoyu Fan Nankai University Inspur Electronic Information Industry Co.,Ltd.
  • Runze Zhang Inspur Electronic Information Industry Co.,Ltd.
  • Liang Jin Inspur Electronic Information Industry Co.,Ltd.
  • Di Wang Inspur Electronic Information Industry Co.,Ltd.
  • Zhenhua Guo Inspur Electronic Information Industry Co.,Ltd.
  • Yaqian Zhao Inspur Electronic Information Industry Co.,Ltd.
  • Rengang Li Tsinghua University Inspur Electronic Information Industry Co.,Ltd.

DOI:

https://doi.org/10.1609/aaai.v38i12.29269

Keywords:

ML: Multimodal Learning, CV: Representation Learning for Vision, CV: Visual Reasoning & Symbolic Representations, KRR: Common-Sense Reasoning, NLP: Generation

Abstract

The emergence of ChatGPT has once again sparked research in generative artificial intelligence (GAI). While people have been amazed by the generated results, they have also noticed the reasoning potential reflected in the generated textual content. However, this current ability for causal reasoning is primarily limited to the domain of language generation, such as in models like GPT-3. In visual modality, there is currently no equivalent research. Considering causal reasoning in visual content generation is significant. This is because visual information contains infinite granularity. Particularly, images can provide more intuitive and specific demonstrations for certain reasoning tasks, especially when compared to coarse-grained text. Hence, we propose a new image generation task called visual question answering with image (VQAI) and establish a dataset of the same name based on the classic Tom and Jerry animated series. Additionally, we develop a new paradigm for image generation to tackle the challenges of this task. Finally, we perform extensive experiments and analyses, including visualizations of the generated content and discussions on the potentials and limitations. The code and data are publicly available under the license of CC BY-NC-SA 4.0 for academic and non-commercial usage at: https://github.com/IEIT-AGI/MIX-Shannon/blob/main/projects/VQAI/lgd_vqai.md.

Published

2024-03-24

How to Cite

Li, X., Fan, B., Zhang, R., Jin, L., Wang, D., Guo, Z., Zhao, Y., & Li, R. (2024). Image Content Generation with Causal Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 38(12), 13646-13654. https://doi.org/10.1609/aaai.v38i12.29269

Issue

Section

AAAI Technical Track on Machine Learning III