Hagar, Nick, and Jack Bandy. 2025. “Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl”. Proceedings of the International AAAI Conference on Web and Social Media 19 (1):2454-64. https://doi.org/10.1609/icwsm.v19i1.35948.