[1]

Y. Ye, T. Le, and D. Lee, “NoisyHate: Mining Online Human-Written Perturbations for Realistic Robustness Benchmarking of Content Moderation Models”, ICWSM, vol. 19, no. 1, pp. 2603–2612, Jun. 2025.