Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Authors

  • Haorui He Department of Interactive Media, Hong Kong Baptist University School of Computing and Data Science, The University of Hong Kong
  • Yupeng Li Department of Interactive Media, Hong Kong Baptist University
  • Bin Benjamin Zhu Microsoft Corporation
  • Dacheng Wen Department of Interactive Media, Hong Kong Baptist University School of Computing and Data Science, The University of Hong Kong
  • Reynold Cheng School of Computing and Data Science, The University of Hong Kong
  • Francis C. M. Lau School of Computing and Data Science, The University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i37.40353

Abstract

State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents Fact2Fiction, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves 8.9%-21.2% higher attack success rates than SOTA attacks across various poisoning budgets and exposes security weaknesses in existing fact-checking systems, highlighting the need for defensive countermeasures.

Published

2026-03-14

How to Cite

He, H., Li, Y., Zhu, B. B., Wen, D., Cheng, R., & Lau, F. C. M. (2026). Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System. Proceedings of the AAAI Conference on Artificial Intelligence, 40(37), 30943-30950. https://doi.org/10.1609/aaai.v40i37.40353

Issue

Section

AAAI Technical Track on Natural Language Processing II