ShadeEdit: A Utility-Preserving and Defense-Evasive Knowledge Manipulation Attack in Federated LLMs

Authors

  • Xu Zhang Chongqing University
  • Hangcheng Liu Nanyang Technological University
  • Shangwei Guo Chongqing University
  • Shudong Zhang Xidian University
  • Tianwei Zhang Nanyang Technological University
  • Tao Xiang Chongqing University

DOI:

https://doi.org/10.1609/aaai.v40i41.40787

Abstract

Recent studies reveal that adversaries can manipulate the internal knowledge of large language models (LLMs) on selected topics through model editing, causing attacker-specified harmful or biased outputs when queried about the edited content. Once such tampered LLMs are distributed, they can mislead users on the targeted topics, thereby potentially propagating misinformation or reinforcing stereotypes. However, existing knowledge manipulation attacks rely on the ability to redistribute compromised models, which is infeasible in constrained settings like Federated Instruction Tuning (FedIT), where a central server controls LLM's training and distribution. In this work, we introduce ShadeEdit, the first attack framework that leverages strengthened model editing to enable knowledge manipulation in FedIT scenarios. ShadeEdit introduces two key components to address two challenges posed by the training process of FedIT: (1) a paraphrase-based editing dataset selection strategy to mitigate the dilution from benign updates on malicious ones by constructing a high-quality editing dataset, and (2) an adaptive manipulation mechanism to evade aggregation-based defenses via an adaptive clipping strategy. ShadeEdit achieves an average 99.5% attack success rate over eight robust aggregation algorithms while preserving instruction-following accuracy, demonstrating its strong attack effectiveness and model-utility preservation.

Downloads

Published

2026-03-14

How to Cite

Zhang, X., Liu, H., Guo, S., Zhang, S., Zhang, T., & Xiang, T. (2026). ShadeEdit: A Utility-Preserving and Defense-Evasive Knowledge Manipulation Attack in Federated LLMs. Proceedings of the AAAI Conference on Artificial Intelligence, 40(41), 34845–34853. https://doi.org/10.1609/aaai.v40i41.40787

Issue

Section

AAAI Technical Track on Natural Language Processing VI