Prototype-Guided Multimodal Relation Extraction based on Entity Attributes

Authors

  • Zefan Zhang College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University
  • Weiqi Zhang College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University
  • Yanhui Li College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University
  • Tian Bai College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, Jilin University

DOI:

https://doi.org/10.1609/aaai.v39i24.34795

Abstract

Multimodal Relation Extraction (MRE) aims to predict relations between head and tail entities based on the context of sentence-image pairs. Most existing MRE methods progressively incorporate textual and visual inputs to dominate the learning process, assuming both contribute significantly to the task. However, the diverse visual appearances and text with ambiguous semantics contain less-informative contexts for the corresponding relation. To tackle these challenges, we highlight the importance of semantically invariant entity attributes that encompass fine-grained categories. Towards this, we propose a novel Prototype-Guided Multimodal Relation Extraction (PG-MRE) framework based on Entity Attributes. Specifically, we first generate detailed entity explanations using Large Language Models (LLMs) to supplement the attribute semantics. Then, the Attribute Prototype Module (APM) refines attribute categories and condenses scattered entity attribute features into cluster-level prototypes. Furthermore, prototype-aligned attribute features guide diverse visual appearance features to produce compact and distinctive multimodal representations in the Relation Prototype Module (RPM). Extensive experiments demonstrate that our method gains superior relation classification capability (especially in scenarios involving various unseen entities), achieving new state-of-the-art performances on MNRE dataset.

Downloads

Published

2025-04-11

How to Cite

Zhang, Z., Zhang, W., Li, Y., & Bai, T. (2025). Prototype-Guided Multimodal Relation Extraction based on Entity Attributes. Proceedings of the AAAI Conference on Artificial Intelligence, 39(24), 26003–26011. https://doi.org/10.1609/aaai.v39i24.34795

Issue

Section

AAAI Technical Track on Natural Language Processing III