Hypothesis-Driven Reasoning for Large Language Models
DOI:
https://doi.org/10.1609/aaai.v40i3.37146Abstract
This paper tackles the fundamental failure of Large Language Models (LLMs) to solve new tasks when prompted with a sufficient, yet overly complex, set of multi-modal episodes. This failure stems from the model's inability to distill underlying patterns from the noisy experiences. We propose Hypothesis-Driven Reasoning (HDR), a framework that enhances LLM reasoning by building an explicit semantic memory—a set of hypotheses induced from the multi-modal episodes. HDR employs a two-stage pipeline. It first extracts potential factors from the episodes and then iteratively refines hypotheses by generate-verify loop with the factors. We first empirically demonstrates this failure and the potential of sematic memory, showing that oracle hypotheses can boost accuracy from 35.3% to 92.0% on a novel task we designed. We then evaluate our HDR, achieving near-oracle performance and significantly outperforming baselines, especially on smaller models. This paper validates a shift from unstructured in-context recall to explicit knowledge abstraction for robust reasoning.Downloads
Published
2026-03-14
How to Cite
Agarwal, A. K., & Yamada, M. (2026). Hypothesis-Driven Reasoning for Large Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 40(3), 1686–1693. https://doi.org/10.1609/aaai.v40i3.37146
Issue
Section
AAAI Technical Track on Cognitive Modeling & Cognitive Systems