HAGAR, Nick; BANDY, Jack. Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl. Proceedings of the International AAAI Conference on Web and Social Media, [S. l.], v. 19, n. 1, p. 2454–2464, 2025. DOI: 10.1609/icwsm.v19i1.35948. Disponível em: https://ojs.aaai.org/index.php/ICWSM/article/view/35948. Acesso em: 9 may. 2026.