RetrySQL: Text-to-SQL Training with Retry Data for Self-Correcting Query Generation
DOI:
https://doi.org/10.1609/aaai.v40i39.40556Abstract
The text-to-SQL task is an active challenge in Natural Language Processing. Many existing solutions focus on using black-box language models extended with specialized components within customized end-to-end text-to-SQL pipelines. While these solutions use both closed-source proprietary language models and coding-oriented open-source models, there is a lack of research regarding SQL-specific small generative models. At the same time, recent advancements in self-correcting generation strategies show promise for improving the capabilities of existing architectures. The application of these concepts to the text-to-SQL task remains unexplored. In this paper, we introduce RetrySQL, a new approach to training text-to-SQL generation models. We prepare reasoning steps for reference SQL queries and then corrupt them to create retry data that contains both incorrect and corrected steps, divided with a special token. We continuously pre-train open-source coding models with this data and demonstrate that retry steps yield an improvements of up to 4 and 9 percentage points for overall and challenging execution metrics, respectively, as compared to pre-training without retry data. We showcase that the self-correcting behavior is learned by the model and the increase in downstream accuracy metrics is a result of this additional skill. Finally, we incorporate RetrySQL-trained models into the full text-to-SQL pipeline and showcase that they are competitive in terms of execution accuracy with proprietary models that contain orders of magnitude more parameters. RetrySQL demonstrates that self-correction can be learned in the text-to-SQL task and provides a novel way of improving generation accuracy for small SQL-oriented language models.Published
2026-03-14
How to Cite
Rączkowska, A., Belluzzo, R., Zieliński, P., Baran, J., & Olszewski, P. (2026). RetrySQL: Text-to-SQL Training with Retry Data for Self-Correcting Query Generation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(39), 32773–32781. https://doi.org/10.1609/aaai.v40i39.40556
Issue
Section
AAAI Technical Track on Natural Language Processing IV