Return to Article Details
Optimizing Against Safety Representations: Activation-Guided Adversarial Suffixes and the Geometry of Refusal
Download
Download PDF