FL-MSRE: A Few-Shot Learning based Approach to Multimodal Social Relation Extraction

Authors

  • Hai Wan School of Computer Science and Engineering, Sun Yat-sen University
  • Manrong Zhang School of Computer Science and Engineering, Sun Yat-sen University
  • Jianfeng Du Guangzhou Key Laboratory of Multilingual Intelligent Processing, Guangdong University of Foreign Studies Pazhou Lab
  • Ziling Huang School of Computer Science and Engineering, Sun Yat-sen University
  • Yufei Yang School of Computer Science and Engineering, Sun Yat-sen University
  • Jeff Z. Pan School of Informatics, The University of Edinburgh

DOI:

https://doi.org/10.1609/aaai.v35i15.17639

Keywords:

Language Grounding & Multi-modal NLP

Abstract

Social relation extraction (SRE for short), which aims to infer the social relation between two people in daily life, has been demonstrated to be of great value in reality. Existing methods for SRE consider extracting social relation only from unimodal information such as text or image, ignoring the high coupling of multimodal information. Moreover, previous studies overlook the serious unbalance distribution on social relations. To address these issues, this paper proposes FL-MSRE, a few-shot learning based approach to extracting social relations from both texts and face images. Considering the lack of multimodal social relation datasets, this paper also presents three multimodal datasets annotated from four classical masterpieces and corresponding TV series. Inspired by the success of BERT, we propose a strong BERT based baseline to extract social relation from text only. FL-MSRE is empirically shown to outperform the baseline significantly. This demonstrates that using face images benefits text-based SRE. Further experiments also show that using two faces from different images achieves similar performance as from the same image. This means that FL-MSRE is suitable for a wide range of SRE applications where the faces of two people can only be collected from different images.

Downloads

Published

2021-05-18

How to Cite

Wan, H., Zhang, M., Du, J., Huang, Z., Yang, Y., & Pan, J. Z. (2021). FL-MSRE: A Few-Shot Learning based Approach to Multimodal Social Relation Extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15), 13916-13923. https://doi.org/10.1609/aaai.v35i15.17639

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing II