Multimodal Event Causality Reasoning with Scene Graph Enhanced Interaction Network

Authors

  • Jintao Liu University of Chinese Academy of Sciences
  • Kaiwen Wei University of Chinese Academy of Sciences
  • Chenglong Liu University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i8.28724

Keywords:

DMKM: Mining of Visual, Multimedia & Multimodal Data, HAI: Applications, NLP: Language Grounding & Multi-modal NLP, RU: Applications, RU: Causality

Abstract

Multimodal event causality reasoning aims to recognize the causal relations based on the given events and accompanying image pairs, requiring the model to have a comprehensive grasp of visual and textual information. However, existing studies fail to effectively model the relations of the objects within the image and capture the object interactions across the image pair, resulting in an insufficient understanding of visual information by the model. To address these issues, we propose a Scene Graph Enhanced Interaction Network (SEIN) in this paper, which can leverage the interactions of the generated scene graph for multimodal event causality reasoning. Specifically, the proposed method adopts a graph convolutional network to model the objects and their relations derived from the scene graph structure, empowering the model to exploit the rich structural and semantic information in the image adequately. To capture the object interactions between the two images, we design an optimal transport-based alignment strategy to match the objects across the images, which could help the model recognize changes in visual information and facilitate causality reasoning. In addition, we introduce a cross-modal fusion module to combine textual and visual features for causality prediction. Experimental results indicate that the proposed SEIN outperforms state-of-the-art methods on the Vis-Causal dataset.

Downloads

Published

2024-03-24

How to Cite

Liu, J., Wei, K., & Liu, C. (2024). Multimodal Event Causality Reasoning with Scene Graph Enhanced Interaction Network. Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 8778-8786. https://doi.org/10.1609/aaai.v38i8.28724

Issue

Section

AAAI Technical Track on Data Mining & Knowledge Management