Hierarchical Aligned Multimodal Learning for NER on Tweet Posts

Authors

  • Peipei Liu Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Hong Li Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Yimo Ren Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Jie Liu Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Shuaizong Si Institute of Information Engineering, Chinese Academy of Sciences
  • Hongsong Zhu Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
  • Limin Sun Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v38i17.29831

Keywords:

NLP: Information Extraction, NLP: Applications, NLP: Other

Abstract

Mining structured knowledge from tweets using named entity recognition (NER) can be beneficial for many downstream applications such as recommendation and intention under standing. With tweet posts tending to be multimodal, multimodal named entity recognition (MNER) has attracted more attention. In this paper, we propose a novel approach, which can dynamically align the image and text sequence and achieve the multi-level cross-modal learning to augment textual word representation for MNER improvement. To be specific, our framework can be split into three main stages: the first stage focuses on intra-modality representation learning to derive the implicit global and local knowledge of each modality, the second evaluates the relevance between the text and its accompanying image and integrates different grained visual information based on the relevance, the third enforces semantic refinement via iterative cross-modal interactions and co-attention. We conduct experiments on two open datasets, and the results and detailed analysis demonstrate the advantage of our model.

Published

2024-03-24

How to Cite

Liu, P., Li, H., Ren, Y., Liu, J., Si, S., Zhu, H., & Sun, L. (2024). Hierarchical Aligned Multimodal Learning for NER on Tweet Posts. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18680-18688. https://doi.org/10.1609/aaai.v38i17.29831

Issue

Section

AAAI Technical Track on Natural Language Processing II