MemeMatch: A Large-Scale Dual-Context Multimodal Dataset and Retrieval System for Internet Memes

Do Tri An Le; Donát Ákos Köller; Qixin Deng; Roland Molontay

doi:10.1609/icwsm.v20i1.42785

MemeMatch: A Large-Scale Dual-Context Multimodal Dataset and Retrieval System for Internet Memes

Authors

Do Tri An Le Wabash College
Donát Ákos Köller Budapest University of Technology and Economics
Qixin Deng Wabash College
Roland Molontay Budapest University of Technology and Economics

DOI:

https://doi.org/10.1609/icwsm.v20i1.42785

Abstract

We introduce MemeMatch, a large-scale multimodal meme dataset and retrieval system that bridges meme collection, annotation, and analysis in a unified pipeline. The dataset contains nearly one million image-with-text memes from Reddit’s r/Memes (2018–2023) and ImgFlip, with rich metadata. Each meme is decomposed into two semantic contexts: local context, capturing the editable text payload (overlay text and title), and global context, capturing the underlying visual substrate or template semantics. Both are enriched with transformer-based annotations, including 14-dimensional sentiment and emotion vectors, BERTopic-derived topics, and zero-shot usage-intent labels. This structured representation supports exploratory analysis and context-aware retrieval by natural language or image query.

Downloads

Published

2026-05-25

How to Cite

Le, D. T. A., Köller, D. Ákos, Deng, Q., & Molontay, R. (2026). MemeMatch: A Large-Scale Dual-Context Multimodal Dataset and Retrieval System for Internet Memes. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 2828–2838. https://doi.org/10.1609/icwsm.v20i1.42785

Download Citation

Issue

Vol. 20 No. 1: Proceedings of the Twentieth International AAAI Conference on Web and Social Media

Section

Dataset Papers

MemeMatch: A Large-Scale Dual-Context Multimodal Dataset and Retrieval System for Internet Memes

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information