Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)

Ze Fu; Junhao Feng; Changmeng Zheng; Yi Cai

doi:10.1609/aaai.v36i11.21610

Authors

Ze Fu School of Software Engineering, South China University of Technology, Guangzhou, China Key Laboratory of Big Data and Intelligent Robot (SCUT), Ministry of Education, China
Junhao Feng School of Software Engineering, South China University of Technology, Guangzhou, China Key Laboratory of Big Data and Intelligent Robot (SCUT), Ministry of Education, China
Changmeng Zheng Department of Computing, Hong Kong Polytechnic University, Hong Kong, China
Yi Cai School of Software Engineering, South China University of Technology, Guangzhou, China Key Laboratory of Big Data and Intelligent Robot (SCUT), Ministry of Education, China

DOI:

https://doi.org/10.1609/aaai.v36i11.21610

Keywords:

Scene Graph Generation, Multimodal Relation Alignment, Knowledge Enhancement

Abstract

Existing scene graph generation methods suffer the limitations when the image lacks of sufficient visual contexts. To address this limitation, we propose a knowledge-enhanced scene graph generation model with multimodal relation alignment, which supplements the missing visual contexts by well-aligned textual knowledge. First, we represent the textual information into contextualized knowledge which is guided by the visual objects to enhance the contexts. Furthermore, we align the multimodal relation triplets by co-attention module for better semantics fusion. The experimental results show the effectiveness of our method.

Knowledge-Enhanced Scene Graph Generation with Multimodal Relation Alignment (Student Abstract)

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription