MCPTox: A Benchmark for Tool Poisoning on Real-World MCP Servers

Authors

  • Zhiqiang Wang University of Science and Technology of China
  • Yichao Gao University of Science and Technology of China
  • Yanting Wang Beihang university
  • Suyuan Liu University of Science and Technology of China
  • Haifeng Sun University of Science and Technology of China
  • Haoran Cheng University of Science and Technology of China
  • Guanquan Shi Beihang university
  • Haohua Du Beihang university
  • Xiangyang Li University of Science and Technology of China

DOI:

https://doi.org/10.1609/aaai.v40i42.40895

Abstract

By providing a standardized interface for LLM agents to interact with external tools, the Model Context Protocol (MCP) is quickly becoming a cornerstone of the modern autonomous agent ecosystem. However, it creates novel attack surfaces due to untrusted external tools. While prior work has focused on attacks injected through external tool outputs, we investigate a more fundamental vulnerability: Tool Poisoning, where malicious instructions are embedded within a tool's metadata at the registration stage. To date, this threat has been primarily demonstrated through isolated cases, lacking a systematic, large-scale evaluation. We introduce MCPTox, the first benchmark to systematically evaluate agent robustness against Tool Poisoning in realistic MCP settings. MCPTox is constructed upon 45 live, real-world MCP servers and 353 authentic tools. To achieve this, we design three distinct attack templates to generate a comprehensive suite of 1348 malicious test cases by few-shot learning, covering 10 categories of potential risks. Our evaluation on 20 prominent LLM agents setting reveals a widespread vulnerability to Tool Poisoning, with GPT-o1-mini, achieving an attack success rate of 72.8%. We find that more capable models are often more susceptible, as the attack exploits their superior instruction-following abilities. Finally, the failure case analysis reveals that agents rarely refuse these attacks, with the highest refused rate (Claude-3.7-Sonnet) less than 3%, demonstrating that existing safety alignment is ineffective against malicious actions that use legitimate tools for unauthorized operation. Our findings create a crucial empirical baseline for understanding and mitigating this widespread threat, and we release MCPTox for the development of verifiably safer AI agents.

Published

2026-03-14

How to Cite

Wang, Z., Gao, Y., Wang, Y., Liu, S., Sun, H., Cheng, H., … Li, X. (2026). MCPTox: A Benchmark for Tool Poisoning on Real-World MCP Servers. Proceedings of the AAAI Conference on Artificial Intelligence, 40(42), 35811–35819. https://doi.org/10.1609/aaai.v40i42.40895

Issue

Section

AAAI Technical Track on Philosophy and Ethics of AI