LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning
DOI:
https://doi.org/10.1609/aaai.v40i36.40243Abstract
Text-to-SQL is a critical task in natural language processing that aims to transform natural language questions into accurate and executable SQL queries. In real-world scenarios, these reasoning tasks are often accompanied by complex mathematical computations, domain knowledge, and hypothetical reasoning scenarios. However, existing large-scale Text-to-SQL datasets typically focus on business logic and task logic, neglecting critical factors such as vertical domain knowledge, complex mathematical reasoning, and hypothetical reasoning, which are essential for realistically reflecting the reasoning demands in practical applications and completing data querying and analysis. To bridge this gap, we introduce LogicCat, the first Text-to-SQL benchmark dataset specifically designed for complex reasoning and chain-of-thought parsing, encompassing physics, arithmetic, commonsense, and hypothetical reasoning scenarios. LogicCat comprises 4,038 English questions paired 12,114 detailed chain-of-thought reasoning steps, spanning 45 databases across diverse domains, significantly surpassing existing datasets in complexity. Experimental results demonstrate that LogicCat substantially increases the task difficulty for current state-of-the-art models to at most 33.20% execution accuracy, indicating that this task remains exceptionally challenging. The advancement of LogicCat represents a crucial step toward developing systems suitable for real-world enterprise data analysis and autonomous query generation.Downloads
Published
2026-03-14
How to Cite
, L., Mao, X., Zhang, D., Li, Y., , L., , K., … Peng, M. (2026). LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning. Proceedings of the AAAI Conference on Artificial Intelligence, 40(36), 29958–29966. https://doi.org/10.1609/aaai.v40i36.40243
Issue
Section
AAAI Technical Track on Natural Language Processing I