Can LLMs Identify Tax Abuse?

Authors

  • Andrew Blair-Stanek University of Maryland Carey School of Law Johns Hopkins University
  • Nils Holzenberger Télécom Paris
  • Benjamin Van Durme Johns Hopkins University

DOI:

https://doi.org/10.1609/aaai.v40i45.41165

Abstract

We investigate whether large language models can discover and analyze U.S. tax-minimization strategies. This real-world domain challenges even seasoned human experts, and progress can reduce tax revenue lost from well-advised, wealthy taxpayers. We evaluate the most advanced LLMs on their ability to (1) interpret and verify tax strategies, (2) fill in gaps in partially specified strategies, and (3) generate complete, end-to-end strategies from scratch. This domain should be of particular interest to the LLM reasoning community: unlike synthetic challenge problems or scientific reasoning tasks, U.S. tax law involves navigating hundreds of thousands of pages of statutes, case law, and administrative guidance, all updated regularly. Notably, an LLM identified an apparently novel tax strategy, highlighting these models' potential to revolutionize tax agencies' fight against tax abuse.

Published

2026-03-14

How to Cite

Blair-Stanek, A., Holzenberger, N., & Van Durme, B. (2026). Can LLMs Identify Tax Abuse?. Proceedings of the AAAI Conference on Artificial Intelligence, 40(45), 38261–38269. https://doi.org/10.1609/aaai.v40i45.41165

Issue

Section

AAAI Special Track on AI for Social Impact I