LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning

Liutao; Xutao Mao; Dixuan Zhang; Yifan Li; LiuHaixin; KongLulu; Jiaming Hou; Rui Li; YunLong Li; Aoze Zheng; Zhiqiang Zhang; Luo Zhewei; Hongying Zan; Kunli Zhang; Min Peng

doi:10.1609/aaai.v40i36.40243

Authors

Liutao Zhengzhou University
Xutao Mao Vanderbilt University
Dixuan Zhang Zhengzhou University
Yifan Li Zhengzhou University
LiuHaixin Zhengzhou University
KongLulu Zhengzhou University
Jiaming Hou Zhengzhou University
Rui Li Zhengzhou University
YunLong Li Zhengzhou University
Aoze Zheng Zhengzhou University
Zhiqiang Zhang Zhengzhou University
Luo Zhewei Zhengzhou University
Hongying Zan Zhengzhou University
Kunli Zhang Zhengzhou University
Min Peng Wuhan University

DOI:

https://doi.org/10.1609/aaai.v40i36.40243

Abstract

Text-to-SQL is a critical task in natural language processing that aims to transform natural language questions into accurate and executable SQL queries. In real-world scenarios, these reasoning tasks are often accompanied by complex mathematical computations, domain knowledge, and hypothetical reasoning scenarios. However, existing large-scale Text-to-SQL datasets typically focus on business logic and task logic, neglecting critical factors such as vertical domain knowledge, complex mathematical reasoning, and hypothetical reasoning, which are essential for realistically reflecting the reasoning demands in practical applications and completing data querying and analysis. To bridge this gap, we introduce LogicCat, the first Text-to-SQL benchmark dataset specifically designed for complex reasoning and chain-of-thought parsing, encompassing physics, arithmetic, commonsense, and hypothetical reasoning scenarios. LogicCat comprises 4,038 English questions paired 12,114 detailed chain-of-thought reasoning steps, spanning 45 databases across diverse domains, significantly surpassing existing datasets in complexity. Experimental results demonstrate that LogicCat substantially increases the task difficulty for current state-of-the-art models to at most 33.20% execution accuracy, indicating that this task remains exceptionally challenging. The advancement of LogicCat represents a crucial step toward developing systems suitable for real-world enterprise data analysis and autonomous query generation.

LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Complex Reasoning

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information