SGoT-R1: Social Graph of Thought Reasoning-Enhanced Multimodal Large Language Model for Harmful Meme Detection

Xiuxian Wang; Yuting Su; Wenhui Li; Xiaowen Wang; Zhuojun Li; Anan Liu

doi:10.1609/aaai.v40i31.39868

Authors

Xiuxian Wang School of Electrical and Information Engineering, Tianjin University, 300072, China
Yuting Su School of Electrical and Information Engineering, Tianjin University, 300072, China
Wenhui Li School of Electrical and Information Engineering, Tianjin University, 300072, China
Xiaowen Wang School of Electrical and Information Engineering, Tianjin University, 300072, China
Zhuojun Li School of Electrical and Information Engineering, Tianjin University, 300072, China
Anan Liu School of Electrical and Information Engineering, Tianjin University, 300072, China

DOI:

https://doi.org/10.1609/aaai.v40i31.39868

Abstract

Internet memes serve as widely distributed multimodal social content that conveys complex ideas through metaphorical expressions, often containing harmful implications that make accurate harmful meme detection an important problem. Reasoning knowledge extracted from large language models plays a crucial role in recent advances in harmful meme detection. However, these methods only perform reasoning analysis on memes from a single opinion, ignoring that memes are essentially products of group consensus, where their true meaning interpretation highly depends on the collision and aggregation process of diverse user viewpoints. To address this problem, we propose a Social Graph of Thought Reasoning Enhancement (SGoTRE) framework for harmful meme detection. The SGoTRE contains three key steps: First, through multi-agent simulation technology, we obtain diverse chains of thought that represent the parsing logic of users from different backgrounds toward memes, authentically restoring the diversity characteristics of group cognition. Second, we construct a Social Graph of Thought (SGoT) that effectively integrates multi-chain reasoning processes and structurally expresses the consensus and diversity of viewpoints among users. Finally, we utilize the SGoT for cognitive distillation, internalizing multi-opinion reasoning logic into a single multimodal large model SGoT-R1 to achieve efficient and interpretable harmful meme detection. Experimental results show that SGoT-R1 significantly improves detection performance on mainstream datasets. Particularly on the most challenging FHM dataset, SGoT-R1 achieves an 8.9% improvement over state-of-the-art models.

SGoT-R1: Social Graph of Thought Reasoning-Enhanced Multimodal Large Language Model for Harmful Meme Detection

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information