LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning

Authors

  • Liutao Zhengzhou University
  • Xutao Mao Vanderbilt University
  • Dixuan Zhang Zhengzhou University
  • Yifan Li Zhengzhou University
  • LiuHaixin Zhengzhou University
  • KongLulu Zhengzhou University
  • Jiaming Hou Zhengzhou University
  • Rui Li Zhengzhou University
  • YunLong Li Zhengzhou University
  • Aoze Zheng Zhengzhou University
  • Zhiqiang Zhang Zhengzhou University
  • Luo Zhewei Zhengzhou University
  • Hongying Zan Zhengzhou University
  • Kunli Zhang Zhengzhou University
  • Min Peng Wuhan University

DOI:

https://doi.org/10.1609/aaai.v40i36.40243

Abstract

Text-to-SQL is a critical task in natural language processing that aims to transform natural language questions into accurate and executable SQL queries. In real-world scenarios, these reasoning tasks are often accompanied by complex mathematical computations, domain knowledge, and hypothetical reasoning scenarios. However, existing large-scale Text-to-SQL datasets typically focus on business logic and task logic, neglecting critical factors such as vertical domain knowledge, complex mathematical reasoning, and hypothetical reasoning, which are essential for realistically reflecting the reasoning demands in practical applications and completing data querying and analysis. To bridge this gap, we introduce LogicCat, the first Text-to-SQL benchmark dataset specifically designed for complex reasoning and chain-of-thought parsing, encompassing physics, arithmetic, commonsense, and hypothetical reasoning scenarios. LogicCat comprises 4,038 English questions paired 12,114 detailed chain-of-thought reasoning steps, spanning 45 databases across diverse domains, significantly surpassing existing datasets in complexity. Experimental results demonstrate that LogicCat substantially increases the task difficulty for current state-of-the-art models to at most 33.20% execution accuracy, indicating that this task remains exceptionally challenging. The advancement of LogicCat represents a crucial step toward developing systems suitable for real-world enterprise data analysis and autonomous query generation.

Published

2026-03-14

How to Cite

, L., Mao, X., Zhang, D., Li, Y., , L., , K., … Peng, M. (2026). LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(36), 29958–29966. https://doi.org/10.1609/aaai.v40i36.40243

Issue

Section

AAAI Technical Track on Natural Language Processing I