TarPro: Targeted Protection Against Malicious Image Editing
DOI:
https://doi.org/10.1609/aaai.v40i11.37844Abstract
The rapid advancement of diffusion-based image editing has enabled highly controllable visual content generation but has also raised serious concerns about the misuse of generative models for producing Not-Safe-for-Work (NSFW) content. Existing protection strategies inject adversarial perturbations to disrupt editing. However, these methods are untargeted, often degrading benign edits while failing to eliminate harmful outputs. In this work, we propose TarPro, a targeted protection framework that blocks malicious edits while preserving benign editing functionality. TarPro introduces Dual-Intent Optimization (DIO), a semantic alignment objective that suppresses malicious prompt effects while retaining desirable, benign edits, by leveraging prompt compositionality rather than requiring manually annotated preferences. To ensure robustness and generalization, we replace pixel-level optimization with a generator-based perturbation learning strategy that learns to produce structured, imperceptible perturbations in parameter space. Experiments on multiple diffusion backbones show that TarPro significantly blocks NSFW content while maintaining high-quality benign edits, outperforming strong baselines through both qualitative and quantitative evaluations.Downloads
Published
2026-03-14
How to Cite
Shen, K., Quan, R., Miao, J., & Xiao, J. (2026). TarPro: Targeted Protection Against Malicious Image Editing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 8896–8904. https://doi.org/10.1609/aaai.v40i11.37844
Issue
Section
AAAI Technical Track on Computer Vision VIII