TarPro: Targeted Protection Against Malicious Image Editing

Authors

  • Kaixin Shen Zhejiang University
  • Ruijie Quan Nanyang Technological University
  • Jiaxu Miao Harbin Institute of Technology, Shenzhen
  • Jun Xiao Zhejiang University

DOI:

https://doi.org/10.1609/aaai.v40i11.37844

Abstract

The rapid advancement of diffusion-based image editing has enabled highly controllable visual content generation but has also raised serious concerns about the misuse of generative models for producing Not-Safe-for-Work (NSFW) content. Existing protection strategies inject adversarial perturbations to disrupt editing. However, these methods are untargeted, often degrading benign edits while failing to eliminate harmful outputs. In this work, we propose TarPro, a targeted protection framework that blocks malicious edits while preserving benign editing functionality. TarPro introduces Dual-Intent Optimization (DIO), a semantic alignment objective that suppresses malicious prompt effects while retaining desirable, benign edits, by leveraging prompt compositionality rather than requiring manually annotated preferences. To ensure robustness and generalization, we replace pixel-level optimization with a generator-based perturbation learning strategy that learns to produce structured, imperceptible perturbations in parameter space. Experiments on multiple diffusion backbones show that TarPro significantly blocks NSFW content while maintaining high-quality benign edits, outperforming strong baselines through both qualitative and quantitative evaluations.

Downloads

Published

2026-03-14

How to Cite

Shen, K., Quan, R., Miao, J., & Xiao, J. (2026). TarPro: Targeted Protection Against Malicious Image Editing. Proceedings of the AAAI Conference on Artificial Intelligence, 40(11), 8896–8904. https://doi.org/10.1609/aaai.v40i11.37844

Issue

Section

AAAI Technical Track on Computer Vision VIII