Hagar, N. and Bandy, J. (2025) “Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl”, Proceedings of the International AAAI Conference on Web and Social Media, 19(1), pp. 2454–2464. doi: 10.1609/icwsm.v19i1.35948.