(1)
Hagar, N.; Bandy, J. Practical Datasets for Analyzing LLM Corpora Derived from Common Crawl. ICWSM 2025, 19, 2454-2464.