Prototype-Guided Multimodal Relation Extraction based on Entity Attributes

Zefan Zhang; Weiqi Zhang; Yanhui Li; Tian Bai

doi:10.1609/aaai.v39i24.34795

Authors

Zefan Zhang College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University
Weiqi Zhang College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University
Yanhui Li College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University
Tian Bai College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University

DOI:

https://doi.org/10.1609/aaai.v39i24.34795

Abstract

Multimodal Relation Extraction (MRE) aims to predict relations between head and tail entities based on the context of sentence-image pairs. Most existing MRE methods progressively incorporate textual and visual inputs to dominate the learning process, assuming both contribute significantly to the task. However, the diverse visual appearances and text with ambiguous semantics contain less-informative contexts for the corresponding relation. To tackle these challenges, we highlight the importance of semantically invariant entity attributes that encompass fine-grained categories. Towards this, we propose a novel Prototype-Guided Multimodal Relation Extraction (PG-MRE) framework based on Entity Attributes. Specifically, we first generate detailed entity explanations using Large Language Models (LLMs) to supplement the attribute semantics. Then, the Attribute Prototype Module (APM) refines attribute categories and condenses scattered entity attribute features into cluster-level prototypes. Furthermore, prototype-aligned attribute features guide diverse visual appearance features to produce compact and distinctive multimodal representations in the Relation Prototype Module (RPM). Extensive experiments demonstrate that our method gains superior relation classification capability (especially in scenarios involving various unseen entities), achieving new state-of-the-art performances on MNRE dataset.

Prototype-Guided Multimodal Relation Extraction based on Entity Attributes

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information