Eguard: Defending LLM Embeddings Against Inversion Attacks via Text Mutual Information Optimization

Authors

  • Tiantian Liu State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security School of Informatics, Xiamen University
  • Hongwei Yao State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
  • Feng Lin State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
  • Tong Wu State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
  • Zhan Qin State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security
  • Kui Ren State Key Laboratory of Blockchain and Data Security, Zhejiang University Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security

DOI:

https://doi.org/10.1609/aaai.v40i42.40876

Abstract

While text embeddings enable efficient semantic processing in LLMs, they remain vulnerable to inversion attacks that reconstruct sensitive original text. However, current defense methods typically treat text embeddings from the feature level independently, ignoring the exploitation of the mutual relation among the embedding construction pipeline. To address this limitation, we propose Eguard, a framework that effectively disrupts chains of relationships between the original semantic space and defended functional space. Our improvements manifest at two levels, i.e., the global-level and local-level mutual information. At the global level, we propose to minimize the statistical dependency between protected embeddings and their original inputs, effectively decoupling sensitive content from the semantic space accessible to adversaries. At the local level, we apply keyword-antonym contrastive learning to enforce semantic discriminability within the space of downstream utility. This synergy of global privacy control and local semantic alignment allows Eguard to achieve a superior privacy-utility trade-off than traditional defenses. Our approach significantly reduces privacy risks, protecting over 95 percent of tokens from inversion while maintaining high performance across downstream tasks consistent with original embeddings.

Downloads

Published

2026-03-14

How to Cite

Liu, T., Yao, H., Lin, F., Wu, T., Qin, Z., & Ren, K. (2026). Eguard: Defending LLM Embeddings Against Inversion Attacks via Text Mutual Information Optimization. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35644–35652. https://doi.org/10.1609/aaai.v40i42.40876

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI