Theory-of-Mind in Partially Observed, Mixed-Motive Games

Authors

  • Nitay Alon The Hebrew University of Jerusalem, Jerusalem, Israel Max Planck Institute for Biological Cybernetic, Tuebingen, Germany

DOI:

https://doi.org/10.1609/aaai.v40i48.42141

Abstract

Theory of Mind (ToM) enables agents to model others' mental states, but in mixed-motive games, this capacity can lead to deceptive behaviour and alignment risks. My research investigates how ToM affects strategic behaviour in partially observed games, contributing: (1) a formal model of ToM-driven manipulation in a preference elicitation task, (2) evidence that excessive ToM leads to paranoid-like overmentalisation, and (3) the Aleph-IPOMDP model, a framework for multi-agent systems that balances ToM reasoning with game-theoretic principles to prevent manipulation, deterring capable agents from deceiving. My work contributes to the understanding of deceptive AI, overcoming deception in multi-agent systems and applications to computational model of human cognition.

Downloads

Published

2026-03-14

How to Cite

Alon, N. (2026). Theory-of-Mind in Partially Observed, Mixed-Motive Games. Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41030–41031. https://doi.org/10.1609/aaai.v40i48.42141