Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts (Student Abstract)

Yash Bansal; Parshiv Kapoor; Agam Pandey

doi:10.1609/aaai.v40i48.42188

Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts (Student Abstract)

Authors

Yash Bansal Indian Institute Of Technology, Roorkee
Parshiv Kapoor Indian Institute Of Technology, Roorkee
Agam Pandey Indian Institute Of Technology, Roorkee

DOI:

https://doi.org/10.1609/aaai.v40i48.42188

Abstract

Large Multimodal Models (LMMs) often hallucinate objects and struggle with compositional reasoning in complex visual scenes. Structured Scene Graph (SG) representations explicitly encoding objects, attributes, and relations can mitigate these issues, however finetuning risks catastrophic forgetting. Recent zero-shot approaches prompt LMMs with scene graphs, yet typically rely on a single SG generated in one step, limiting capture of holistic context and question-specific details. We introduce a Dual-Layer Scene Graph Chain-of-Thought DLSG-CoT framework that enriches reasoning by combining two structured SGs: a Global Scene Graph (G-SG) that offers comprehensive image context, and a Query-Specific Scene Graph (Q-SG) produced through a two-step process targeting information relevant to the input query. Extensive experiments demonstrate that DLSG-CoT substantially improves LMM performance on compositional and context-sensitive tasks.

AAAI-26 / IAAI-26 / EAAI-26 Proceedings Cover

Downloads

Published

2026-03-14

How to Cite

Bansal, Y., Kapoor, P., & Pandey, A. (2026). Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41132–41133. https://doi.org/10.1609/aaai.v40i48.42188

Download Citation

Issue

Vol. 40 No. 48: EAAI-26 AI for Education, Model AI Assignments, AAAI-26 Emerging Trends, Doctoral Consortium, Student Abstracts, Undergraduate Consortium and Demonstrations

Section

AAAI Student Abstract and Poster Program

Zero-Shot Vision Language Reasoning via Dual-layer Scene Graph Chain of Thoughts (Student Abstract)

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information