[1]
N. Hagar and J. Bandy, “Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl”, ICWSM, vol. 19, no. 1, pp. 2454–2464, Jun. 2025.