Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Dong Zhang; Suzhong Wei; Shoushan Li; Hanqian Wu; Qiaoming Zhu; Guodong Zhou

doi:10.1609/aaai.v35i16.17687

Authors

Dong Zhang Soochow University
Suzhong Wei Southeast University
Shoushan Li Soochow University
Hanqian Wu Southeast University
Qiaoming Zhu Soochow University
Guodong Zhou Soochow University

DOI:

https://doi.org/10.1609/aaai.v35i16.17687

Keywords:

Information Extraction

Abstract

Multi-modal named entity recognition (MNER) aims to discover named entities in free text and classify them into pre-defined types with images. However, dominant MNER models do not fully exploit fine-grained semantic correspondences between semantic units of different modalities, which have the potential to refine multi-modal representation learning. To deal with this issue, we propose a unified multi-modal graph fusion (UMGF) approach for MNER. Specifically, we first represent the input sentence and image using a unified multi-modal graph, which captures various semantic relationships between multi-modal semantic units (words and visual objects). Then, we stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations. Finally, we achieve an attention-based multi-modal representation for each word and perform entity labeling with a CRF decoder. Experimentation on the two benchmark datasets demonstrates the superiority of our MNER model.

Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription