Hagar, N., & Bandy, J. (2025). Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl. Proceedings of the International AAAI Conference on Web and Social Media, 19(1), 2454–2464. https://doi.org/10.1609/icwsm.v19i1.35948