AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale

Authors

  • Ziyang Wang School of Software Engineering, Huazhong University of Science and Technology
  • Yuanlei Zheng School of Software Engineering, Huazhong University of Science and Technology
  • Zhenbiao Cao School of Software Engineering, Huazhong University of Science and Technology
  • Xiaojin Zhang School of Computer Science and Technology, Huazhong University of Science and Technology
  • Zhongyu Wei School of Data Science, Fudan University
  • Pei Fu MiLM Plus, Xiaomi Inc.
  • Zhenbo Luo MiLM Plus, Xiaomi Inc.
  • Wei Chen School of Software Engineering, Huazhong University of Science and Technology
  • Xiang Bai School of Software Engineering, Huazhong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i40.40672

Abstract

For industrial-scale text-to-SQL, supplying the entire database schema to Large Language Models (LLMs) is impractical due to context window limits and irrelevant noise. Schema linking, which filters the schema to a relevant subset, is therefore critical. However, existing methods incur prohibitive costs, struggle to trade off recall and noise, and scale poorly to large databases. We present AutoLink, an autonomous agent framework that reformulates schema linking as an iterative, agent-driven process. Guided by an LLM, AutoLink dynamically explores and expands the linked schema subset, progressively identifying necessary schema components without inputting the full database schema. Our experiments demonstrate AutoLink's superior performance, achieving state-of-the-art strict schema linking recall of 97.4% on Bird-Dev and 91.2% on Spider 2.0-Lite, with competitive execution accuracy, i.e., 68.7% EX on Bird-Dev (better than CHESS) and 34.9% EX on Spider 2.0-Lite (ranking 2nd on the official leaderboard). Crucially, AutoLink exhibits exceptional scalability, maintaining high recall, efficient token consumption, and robust execution accuracy on large schemas (e.g., over 3,000 columns) where existing methods severely degrade—making it a highly scalable, high-recall schema-linking solution for industrial text-to-SQL systems.

Downloads

Published

2026-03-14

How to Cite

Wang, Z., Zheng, Y., Cao, Z., Zhang, X., Wei, Z., Fu, P., Luo, Z., Chen, W., & Bai, X. (2026). AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 33809-33817. https://doi.org/10.1609/aaai.v40i40.40672

Issue

Section

AAAI Technical Track on Natural Language Processing V