SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge

Authors

  • Adeel Yousaf Center for Research in Computer Vision, University of Central Florida
  • Joseph Fioresi Center for Research in Computer Vision, University of Central Florida
  • James Beetham Center for Research in Computer Vision, University of Central Florida
  • Amrit Singh Bedi SAFERR AI Lab, University of Central Florida
  • Mubarak Shah Center for Research in Computer Vision, University of Central Florida

DOI:

https://doi.org/10.1609/aaai.v40i42.40917

Abstract

Improving the safety of vision-language models like CLIP via fine-tuning often comes at a steep price, causing significant drops in their generalization performance. We find this trade-off stems from rigid alignment strategies that force unsafe concepts toward single, predefined safe targets, disrupting the model's learned semantic structure. To address this, we propose a proximity-aware approach: redirecting unsafe concepts to their semantically closest safe alternatives to minimize representational change. We introduce SafeR-CLIP, a fine-tuning framework that applies this principle of minimal intervention. SafeR-CLIP successfully reconciles safety and performance, recovering up to 8.0% in zero-shot accuracy over prior methods while maintaining robust safety. To support more rigorous evaluation, we also contribute NSFWCaps, a new benchmark of 1,000 highly-aligned pairs for testing safety under distributional shift. Our work shows that respecting the geometry of pretrained representations is key to achieving safety without sacrificing performance.

Downloads

Published

2026-03-14

How to Cite

Yousaf, A., Fioresi, J., Beetham, J., Bedi, A. S., & Shah, M. (2026). SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 36012–36020. https://doi.org/10.1609/aaai.v40i42.40917

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI