Do Large Language Models (LLMs) Understand Chronology? (Student Abstract)

Authors

  • Pattaraphon Kenny Wongchamcharoen University of California, Berkeley
  • Paul Glasserman Columbia Business School

DOI:

https://doi.org/10.1609/aaai.v40i48.42295

Abstract

Large language models have shown great potential as forecasting tools in finance and economics, but backtesting performance is subject to look-ahead bias if the period overlaps with an LLM’s training window. Prompt-based attempts to avoid look-ahead bias require that LLMs understand chronology. We test LLMs’ ability to understand and enforce chronological order in three types of tasks: sorting randomly shuffled historical events; conditional sorting of events defined by some conditions; and anachronism detection based on intersections of multiple timelines. Our experiments use events that we first confirm are known to the LLM; this ensures that we test chronological understanding on an LLM’s pretrained internal knowledge. Across three LLM families— GPT-4.1 (standard), GPT-5 (hybrid-reasoning), and Claude 3.7 Sonnet (large-reasoning, with and without Extended Thinking), we find that performance degrades rapidly with problem complexity but improves greatly for reasoning models with test-time extended reasoning. These patterns are important for the real-time application of LLMs in finance.

Published

2026-03-14

How to Cite

Wongchamcharoen, P. K., & Glasserman, P. (2026). Do Large Language Models (LLMs) Understand Chronology? (Student Abstract). Proceedings of the AAAI Conference on Artificial Intelligence, 40(48), 41430–41432. https://doi.org/10.1609/aaai.v40i48.42295