[1]

X. Xu, “DeepPhy: Benchmarking Agentic VLMs on Physical Reasoning”, AAAI, vol. 40, no. 40, pp. 34160-34168, Mar. 2026.