ShadeEdit: A Utility-Preserving and Defense-Evasive Knowledge Manipulation Attack in Federated LLMs

Xu Zhang; Hangcheng Liu; Shangwei Guo; Shudong Zhang; Tianwei Zhang; Tao Xiang

doi:10.1609/aaai.v40i41.40787

Authors

Xu Zhang Chongqing University
Hangcheng Liu Nanyang Technological University
Shangwei Guo Chongqing University
Shudong Zhang Xidian University
Tianwei Zhang Nanyang Technological University
Tao Xiang Chongqing University

DOI:

https://doi.org/10.1609/aaai.v40i41.40787

Abstract

Recent studies reveal that adversaries can manipulate the internal knowledge of large language models (LLMs) on selected topics through model editing, causing attacker-specified harmful or biased outputs when queried about the edited content. Once such tampered LLMs are distributed, they can mislead users on the targeted topics, thereby potentially propagating misinformation or reinforcing stereotypes. However, existing knowledge manipulation attacks rely on the ability to redistribute compromised models, which is infeasible in constrained settings like Federated Instruction Tuning (FedIT), where a central server controls LLM's training and distribution. In this work, we introduce ShadeEdit, the first attack framework that leverages strengthened model editing to enable knowledge manipulation in FedIT scenarios. ShadeEdit introduces two key components to address two challenges posed by the training process of FedIT: (1) a paraphrase-based editing dataset selection strategy to mitigate the dilution from benign updates on malicious ones by constructing a high-quality editing dataset, and (2) an adaptive manipulation mechanism to evade aggregation-based defenses via an adaptive clipping strategy. ShadeEdit achieves an average 99.5% attack success rate over eight robust aggregation algorithms while preserving instruction-following accuracy, demonstrating its strong attack effectiveness and model-utility preservation.

ShadeEdit: A Utility-Preserving and Defense-Evasive Knowledge Manipulation Attack in Federated LLMs

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information