Steering Representations, Safeguarding Privacy: A Cross-Modal Privacy Protection Method for Generative AI

Jie Zhang; Chenxu Niu; Zhefeng Nan; Yangyan Xu; Jinta Weng

doi:10.1609/aaai.v40i41.40773

Authors

Jie Zhang Shanghai Artificial Intelligence Laboratory
Chenxu Niu Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
Zhefeng Nan Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
Yangyan Xu HiThink Research
Jinta Weng School of Cyber Security, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v40i41.40773

Abstract

Privacy concerns have long been a critical issue in AI models. With the rapid advancement of generative AI, the privacy awareness of models has drawn attention, raising new challenges for privacy protection that is independent of data and tasks. This paper introduces a novel framework for enhancing privacy protection through directional steering in representation space, which seamlessly integrates with both language and vision-language models. Specifically, we first construct a comprehensive privacy-related dataset based on the Solove taxonomy of privacy. Then, we leverage this dataset to enhance model privacy awareness in the representation space, steering the model to protect privacy during inference. Experiments on 12 models validate the effectiveness and generalization of our method. Moreover, we demonstrate the transferability of privacy-enhanced representations between same-source large language models (LLMs) and vision-language models (VLMs), offering a scalable solution for privacy protection in frontier AI models.

Steering Representations, Safeguarding Privacy: A Cross-Modal Privacy Protection Method for Generative AI

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information