What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets

Marco Antonio Stranisci; Christian Hardmeier

doi:10.1609/aaai.v40i46.41279

What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets

Authors

Marco Antonio Stranisci University of Turin aequa-tech
Christian Hardmeier IT University of Copenaghen

DOI:

https://doi.org/10.1609/aaai.v40i46.41279

Abstract

Data filtering strategies are a crucial component to develop safe Large Language Models (LLM), since they support the removal of harmful contents from pretraining datasets. There is a lack of research on the actual impact of these strategies on vulnerable groups to discrimination, though, and their effectiveness has not been yet systematically addressed. In this paper we present a benchmark study of data filtering strategies for harm reduction aimed at providing a systematic evaluation on these approaches. We provide an overview 55 technical reports of English LMs and LLMs to identify the existing filtering strategies in literature and implement an experimental setting to test their impact against vulnerable groups. Our results show that the positive impact that strategies have in reducing harmful contents from documents has the side effect of increasing the underrepresentation of vulnerable groups to discrimination in datasets.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Stranisci, M. A., & Hardmeier, C. (2026). What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets. Proceedings of the AAAI Conference on Artificial Intelligence, 40(46), 39303–39313. https://doi.org/10.1609/aaai.v40i46.41279

Download Citation

Issue

Vol. 40 No. 46: AAAI-26 Special Track AI for Social Impact II and Senior Member Presentations

Section

AAAI Special Track on AI for Social Impact II

What Are They Filtering Out? An Experimental Benchmark of Filtering Strategies for Harm Reduction in Pretraining Datasets

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information