Steering Representations, Safeguarding Privacy: A Cross-Modal Privacy Protection Method for Generative AI
DOI:
https://doi.org/10.1609/aaai.v40i41.40773Abstract
Privacy concerns have long been a critical issue in AI models. With the rapid advancement of generative AI, the privacy awareness of models has drawn attention, raising new challenges for privacy protection that is independent of data and tasks. This paper introduces a novel framework for enhancing privacy protection through directional steering in representation space, which seamlessly integrates with both language and vision-language models. Specifically, we first construct a comprehensive privacy-related dataset based on the Solove taxonomy of privacy. Then, we leverage this dataset to enhance model privacy awareness in the representation space, steering the model to protect privacy during inference. Experiments on 12 models validate the effectiveness and generalization of our method. Moreover, we demonstrate the transferability of privacy-enhanced representations between same-source large language models (LLMs) and vision-language models (VLMs), offering a scalable solution for privacy protection in frontier AI models.Downloads
Published
2026-03-14
How to Cite
Zhang, J., Niu, C., Nan, Z., Xu, Y., & Weng, J. (2026). Steering Representations, Safeguarding Privacy: A Cross-Modal Privacy Protection Method for Generative AI. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34719–34727. https://doi.org/10.1609/aaai.v40i41.40773
Issue
Section
AAAI Technical Track on Natural Language Processing VI