JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics

Authors

  • Simindokht Jahangard Monash University
  • Mehrzad Mohammadi Sharif University of Technology
  • Yi Shen Monash University
  • Zhixi Cai Monash University
  • Hamid Rezatofighi Monash University

DOI:

https://doi.org/10.1609/aaai.v40i7.37443

Abstract

Recent advances in Vision-Language Models (VLMs) and large language models (LLMs) have greatly enhanced visual reasoning, a key capability for embodied AI agents like robots. However, existing visual reasoning benchmarks often suffer from several limitations: they lack a clear definition of reasoning complexity, offer have no control to generate questions over varying difficulty and task customization, and fail to provide structured, step-by-step reasoning annotations (workflows). To bridge these gaps, we formalize reasoning complexity, introduce an adaptive query engine that generates customizable questions of varying complexity with detailed intermediate annotations, and extend the JRDB dataset with human-object interaction and geometric relationship annotations to create JRDB-Reasoning, a benchmark tailored for visual reasoning in human-crowded environments. Our engine and benchmark enable fine-grained evaluation of visual reasoning frameworks and dynamic assessment of visual-language models across reasoning levels.

Downloads

Published

2026-03-14

How to Cite

Jahangard, S., Mohammadi, M., Shen, Y., Cai, Z., & Rezatofighi, H. (2026). JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics. Proceedings of the AAAI Conference on Artificial Intelligence, 40(7), 5276–5286. https://doi.org/10.1609/aaai.v40i7.37443

Issue

Section

AAAI Technical Track on Computer Vision IV