Adversarial Attack on Black-Box Multi-Agent by Adaptive Perturbation

Authors

  • Jianming Chen Institute of Software Chinese Academy of Sciences, Beijing, China Science & Technology on Integrated Information System Laboratory, Beijing, China State Key Laboratory of Complex System Modeling and Simulation Technology, Beijing, China University of Chinese Academy of Sciences, Beijing, China
  • Yawen Wang Institute of Software Chinese Academy of Sciences, Beijing, China Science & Technology on Integrated Information System Laboratory, Beijing, China State Key Laboratory of Complex System Modeling and Simulation Technology, Beijing, China University of Chinese Academy of Sciences, Beijing, China
  • Junjie Wang Institute of Software Chinese Academy of Sciences, Beijing, China Science & Technology on Integrated Information System Laboratory, Beijing, China State Key Laboratory of Complex System Modeling and Simulation Technology, Beijing, China University of Chinese Academy of Sciences, Beijing, China
  • Xiaofei Xie Singapore Management University, Singapore
  • Yuanzhe Hu Institute of Software Chinese Academy of Sciences, Beijing, China Science & Technology on Integrated Information System Laboratory, Beijing, China State Key Laboratory of Complex System Modeling and Simulation Technology, Beijing, China University of Chinese Academy of Sciences, Beijing, China
  • Qing Wang Institute of Software Chinese Academy of Sciences, Beijing, China Science & Technology on Integrated Information System Laboratory, Beijing, China State Key Laboratory of Complex System Modeling and Simulation Technology, Beijing, China University of Chinese Academy of Sciences, Beijing, China
  • Fanjiang Xu Institute of Software Chinese Academy of Sciences, Beijing, China Science & Technology on Integrated Information System Laboratory, Beijing, China State Key Laboratory of Complex System Modeling and Simulation Technology, Beijing, China University of Chinese Academy of Sciences, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v40i35.40176

Abstract

Evaluating security and reliability for multi-agent systems (MAS) is urgent as they become increasingly prevalent in various applications. As an evaluation technique, existing adversarial attack frameworks face certain limitations, e.g., impracticality due to the requirement of white-box information or high control authority, and a lack of stealthiness or effectiveness as they often target all agents or specific fixed agents. To address these issues, we propose AdapAM, a novel framework for adversarial attacks on black-box MAS. AdapAM incorporates two key components: (1) Adaptive Selection Policy simultaneously selects the victim and determines the anticipated malicious action (the action would lead to the worst impact on MAS), balancing effectiveness and stealthiness. (2) Proxy-based Perturbation to Induce Malicious Action utilizes generative adversarial imitation learning to approximate the target MAS, allowing AdapAM to generate perturbed observations using white-box information and thus induce victims to execute malicious action in black-box settings. We evaluate AdapAM across eight multi-agent environments and compare it with four state-of-the-art and commonly-used baselines. Results demonstrate that AdapAM achieves the best attack performance in different perturbation rates. Besides, AdapAM-generated perturbations are the least noisy and hardest to detect, emphasizing the stealthiness.

Downloads

Published

2026-03-14

How to Cite

Chen, J., Wang, Y., Wang, J., Xie, X., Hu, Y., Wang, Q., & Xu, F. (2026). Adversarial Attack on Black-Box Multi-Agent by Adaptive Perturbation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(35), 29359–29367. https://doi.org/10.1609/aaai.v40i35.40176

Issue

Section

AAAI Technical Track on Multiagent Systems