Optimization and Robustness-Informed Membership Inference Attacks for LLMs

Zichen Song; Qixin Zhang; Ming Li; Yao Shu

doi:10.1609/aaai.v40i39.40587

Authors

Zichen Song Sungkyunkwan University
Qixin Zhang Nanyang Technological University
Ming Li Guangming Laboratory
Yao Shu The Hong Kong University of Science and Technology (Guangzhou)

DOI:

https://doi.org/10.1609/aaai.v40i39.40587

Abstract

The proliferation of Large Language Models (LLMs) has raised concerns over training data privacy. Membership Inference Attacks (MIA), aiming to identify whether specific data was used for training, pose significant privacy risks. However, existing MIA methods struggle to address the scale and complexity of modern LLMs. This paper introduces OR-MIA, a novel MIA framework inspired by model optimization and input robustness. First, training data points are expected to exhibit smaller gradient norms due to optimization dynamics. Second, member samples show greater stability, with gradient norms being less sensitive to controlled input perturbations. OR-MIA leverages these principles by perturbing inputs, computing gradient norms, and using them as features for a robust classifier to distinguish members from non-members. Evaluations on LLMs (70M to 6B parameters) and various datasets demonstrate that OR-MIA outperforms existing methods, achieving over 90% accuracy. Our findings highlight a critical vulnerability in LLMs and underscore the need for improved privacy-preserving training paradigms.

Optimization and Robustness-Informed Membership Inference Attacks for LLMs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information