[1]
Hagar, N. and Bandy, J. 2025. Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl. Proceedings of the International AAAI Conference on Web and Social Media. 19, 1 (Jun. 2025), 2454–2464. DOI:https://doi.org/10.1609/icwsm.v19i1.35948.