TeraGram: A Structured Longitudinal Dataset of the Telegram Messenger
DOI:
https://doi.org/10.1609/icwsm.v20i1.42783Abstract
Here we present a massive longitudinal dataset of public Telegram content, comprising over 5.9 billion messages dating from 2015 to 2025, collected from 712 thousand channels and groups, enriched with metadata on forwards, reactions, and polls. The dataset spans multiple languages including Russian and Farsi, representing countries where Telegram shows mainstream adoption, as well as Western languages where Telegram is used in specific sub-communities. The dataset has several advantages. First, when restricted by language, it provides a versatile example of an algorithm-free platform, contrary to many other social media platforms that are strongly influenced by opaque content-curation algorithms. Second, it enables comparative studies across different languages, communities, and user bases under identical platform affordances. The dataset thus offers a foundation for studying engagement patterns, network evolution, and community formation in the absence of algorithmic curation.Downloads
Published
2026-05-25
How to Cite
Golovin, A., Mohr, S. B., Gottwald, A. I., Hvid, U., Trivedi, S., Pinheiro Neto, J., … Priesemann, V. (2026). TeraGram: A Structured Longitudinal Dataset of the Telegram Messenger. Proceedings of the International AAAI Conference on Web and Social Media, 20(1), 2794–2816. https://doi.org/10.1609/icwsm.v20i1.42783
Issue
Section
Dataset Papers