Beyond Text: Fine-Grained Multi-Modal Fact Verification with Hypergraph Transformers

Authors

  • Hui Pang Beijing University of Posts and Telecommunications
  • Chaozhuo Li Beijing University of Posts and Telecommunications
  • Litian Zhang Beihang University
  • Senzhang Wang Central South University
  • Xi Zhang Beijing University of Posts and Telecommunications

DOI:

https://doi.org/10.1609/aaai.v39i6.32684

Abstract

Fact verification has become increasingly vital in the internet age, driven by the proliferation of false claims and political misinformation. While traditional methods rely predominantly on text-based evidence, multi-modal evidence introduces richer sources of information, offering valuable insights for claim verification. Existing multi-modal verification models often focus on superficial correlations between claims and evidence, neglecting the complex semantic interactions present in fine-grained multi-modal signals. In this paper, we propose a novel framework for multi-modal fact-checking, named Hypergraph Transformer-based Multi-modal Fact-Checking (HGTMFC). Our approach captures high-order relationships between different modalities of evidence and claims by leveraging hypergraphs. HGTMFC models the intricate relationships among evidence across various modalities and enhances information propagation through a transformer-based mechanism embedded within the hypergraph. Moreover, we utilize linegraphs to refine this propagation process, further strengthening the model's reasoning capabilities. Experiments on benchmark datasets demonstrate that our model significantly outperforms existing approaches in multi-modal fact verification.

Downloads

Published

2025-04-11

How to Cite

Pang, H., Li, C., Zhang, L., Wang, S., & Zhang, X. (2025). Beyond Text: Fine-Grained Multi-Modal Fact Verification with Hypergraph Transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 39(6), 6389–6397. https://doi.org/10.1609/aaai.v39i6.32684

Issue

Section

AAAI Technical Track on Computer Vision V