YE, Yiran; LE, Thai; LEE, Dongwon. NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models. Proceedings of the International AAAI Conference on Web and Social Media, [S. l.], v. 19, n. 1, p. 2603–2612, 2025. DOI: 10.1609/icwsm.v19i1.35961. Disponível em: https://ojs.aaai.org/index.php/ICWSM/article/view/35961. Acesso em: 29 may. 2026.