RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER

Lin  Sun; Jiquan Wang; Kai Zhang; Yindu Su; Fangsheng Weng

doi:10.1609/aaai.v35i15.17633

Authors

Lin Sun Zhejiang University City College
Jiquan Wang Zhejiang University Zhejiang University City College
Kai Zhang Tsinghua University
Yindu Su Zhejiang University Zhejiang University City College
Fangsheng Weng Zhejiang University City College

DOI:

https://doi.org/10.1609/aaai.v35i15.17633

Keywords:

Information Extraction

Abstract

Recently multimodal named entity recognition (MNER) has utilized images to improve the accuracy of NER in tweets. However, most of the multimodal methods use attention mechanisms to extract visual clues regardless of whether the text and image are relevant. Practically, the irrelevant text-image pairs account for a large proportion in tweets. The visual clues that are unrelated to the texts will exert uncertain or even negative effects on multimodal model learning. In this paper, we introduce a method of text-image relation propagation into the multimodal BERT model. We integrate soft or hard gates to select visual clues and propose a multitask algorithm to train and validate the effects of relation propagation on the MNER datasets. In the experiments, we deeply analyze the changes in visual attention before and after the use of relation propagation. Our model achieves state-of-the-art performance on the MNER datasets.

RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription