XU, X.; BU, P.; WANG, Y.; KARLSSON, B. F.; WANG, Z.; SONG, T.; ZHU, Q.; SONG, J.; DING, Z.; ZHENG, B. DeepPhy: Benchmarking Agentic VLMs on Physical Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 40, n. 40, p. 34160-34168, 2026. DOI: 10.1609/aaai.v40i40.40711. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/40711. Acesso em: 22 apr. 2026.