Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics

Sadaf Ghaffari; Nikhil Krishnaswamy

doi:10.1609/aaaiss.v3i1.31189

Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics

Authors

Sadaf Ghaffari Colorado State University, Fort Collins, CO
Nikhil Krishnaswamy Colorado State University, Fort Collins, CO

DOI:

https://doi.org/10.1609/aaaiss.v3i1.31189

Keywords:

Physical Dynamics, Multimodal Reasoning, LLMs

Abstract

In this paper, we present an exploration of LLMs' abilities to problem solve with physical reasoning in situated environments. We construct a simple simulated environment and demonstrate examples of where, in a zero-shot setting, both text and multimodal LLMs display atomic world knowledge about various objects but fail to compose this knowledge in correct solutions for an object manipulation and placement task. We also use BLIP, a vision-language model trained with more sophisticated cross-modal attention, to identify cases relevant to object physical properties that that model fails to ground. Finally, we present a procedure for discovering the relevant properties of objects in the environment and propose a method to distill this knowledge back into the LLM.

AAAI Spring Symposium 2024 Proceedings Cover

Downloads

Published

2024-05-20

How to Cite

Ghaffari, S., & Krishnaswamy, N. (2024). Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics. Proceedings of the AAAI Symposium Series, 3(1), 105–114. https://doi.org/10.1609/aaaiss.v3i1.31189

Download Citation

Issue

Vol. 3 No. 1: Proceedings of the 2024 AAAI Spring Symposium Series

Section

Empowering Machine Learning and Large Language Models with Domain and Commonsense Knowledge

Exploring Failure Cases in Multimodal Reasoning About Physical Dynamics

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information