EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer

Authors

  • Pukun Zhao Guangdong University of Finance & Economics
  • Longxiang Wang Chongqing University
  • Miaowei Wang University of Edinburgh, University of Edinburgh
  • Chen Chen Guangdong university of Finance & Economics
  • Fanqing Zhou Guangdong University of Finance & Economics
  • Haojian Huang Hong Kong University of Science and Technology (Guangzhou) The University of Hong Kong

DOI:

https://doi.org/10.1609/aaai.v40i43.40979

Abstract

Most existing spatial reasoning benchmarks focus on static or globally observable environments, failing to capture the challenges of long-horizon reasoning and memory utilization under partial observability and dynamic changes. We introduce two dynamic spatial benchmarks—locally observable maze navigation and match-2 elimination—that systematically evaluate models' abilities in spatial understanding and adaptive planning when local perception, environment feedback, and global objectives are tightly coupled. Each action triggers structural changes in the environment, requiring continuous update of cognition and strategy. We further propose a subjective experience-based memory mechanism for cross-task experience transfer and validation. Experiments show that our benchmarks reveal key limitations of mainstream models in dynamic spatial reasoning and long-term memory, providing a comprehensive platform for future methodological advances.

Downloads

Published

2026-03-14

How to Cite

Zhao, P., Wang, L., Wang, M., Chen, C., Zhou, F., & Huang, H. (2026). EvoEmpirBench: Dynamic Spatial Reasoning with Agent-ExpVer. Proceedings of the AAAI Conference on Artificial Intelligence, 40(43), 36564–36572. https://doi.org/10.1609/aaai.v40i43.40979

Issue

Section

AAAI Technical Track on Planning, Routing, and Scheduling