TOT:Topology-Aware Optimal Transport for Multimodal Hate Detection

Authors

  • Linhao Zhang Aerospace Information Research Institute, Chinese Academy of Sciences
  • Li Jin Aerospace Information Research Institute, Chinese Academy of Sciences
  • Xian Sun Aerospace Information Research Institute, Chinese Academy of Sciences
  • Guangluan Xu Aerospace Information Research Institute, Chinese Academy of Sciences
  • Zequn Zhang Aerospace Information Research Institute, Chinese Academy of Sciences
  • Xiaoyu Li Aerospace Information Research Institute, Chinese Academy of Sciences
  • Nayu Liu Aerospace Information Research Institute, Chinese Academy of Sciences
  • Qing Liu Aerospace Information Research Institute, Chinese Academy of Sciences
  • Shiyao Yan Aerospace Information Research Institute, Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v37i4.25614

Keywords:

DMKM: Mining of Visual, Multimedia & Multimodal Data, CMS: Affective Computing, CV: Language and Vision, DMKM: Graph Mining, Social Network Analysis & Community Mining, ML: Multimodal Learning, SNLP: Speech and Multimodality

Abstract

Multimodal hate detection, which aims to identify the harmful content online such as memes, is crucial for building a wholesome internet environment. Previous work has made enlightening exploration in detecting explicit hate remarks. However, most of their approaches neglect the analysis of implicit harm, which is particularly challenging as explicit text markers and demographic visual cues are often twisted or missing. The leveraged cross-modal attention mechanisms also suffer from the distributional modality gap and lack logical interpretability. To address these semantic gap issues, we propose TOT: a topology-aware optimal transport framework to decipher the implicit harm in memes scenario, which formulates the cross-modal aligning problem as solutions for optimal transportation plans. Specifically, we leverage an optimal transport kernel method to capture complementary information from multiple modalities. The kernel embedding provides a non-linear transformation ability to reproduce a kernel Hilbert space (RKHS), which reflects significance for eliminating the distributional modality gap. Moreover, we perceive the topology information based on aligned representations to conduct bipartite graph path reasoning. The newly achieved state-of-the-art performance on two publicly available benchmark datasets, together with further visual analysis, demonstrate the superiority of TOT in capturing implicit cross-modal alignment.

Downloads

Published

2023-06-26

How to Cite

Zhang, L., Jin, L., Sun, X., Xu, G., Zhang, Z., Li, X., Liu, N., Liu, Q., & Yan, S. (2023). TOT:Topology-Aware Optimal Transport for Multimodal Hate Detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(4), 4884-4892. https://doi.org/10.1609/aaai.v37i4.25614

Issue

Section

AAAI Technical Track on Data Mining and Knowledge Management