The Tatort Test of Intelligence: Towards Narrative Comprehension as a Benchmark for AI
DOI:
https://doi.org/10.1609/aaai.v40i46.41327Abstract
We propose—somewhat tongue-in-cheek, yet with serious implications—a new test for artificial intelligence: the ability to watch a 90-minute episode of the long-running German crime drama Tatort, and to explain every relevant detail. This involves reconstructing the evolving social network of characters, identifying their beliefs, desires, and intentions, and, crucially, determining who committed the crime. We argue that this task integrates narrative understanding, common-sense reasoning, social cognition, and theory of mind—and thus provides a uniquely challenging benchmark for AI.Published
2026-03-14
How to Cite
Kramer, S., Baur, L., & Reinhardt, L. (2026). The Tatort Test of Intelligence: Towards Narrative Comprehension as a Benchmark for AI. Proceedings of the AAAI Conference on Artificial Intelligence, 40(46), 39722–39727. https://doi.org/10.1609/aaai.v40i46.41327
Issue
Section
Senior Member Presentation