MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Longxu Dou; Yan Gao; Mingyang Pan; Dingzirui Wang; Wanxiang Che; Dechen Zhan; Jian-Guang Lou

doi:10.1609/aaai.v37i11.26499

Authors

Longxu Dou Harbin Institute of Technology
Yan Gao Microsoft Research Asia
Mingyang Pan Harbin Institute of Technology
Dingzirui Wang Harbin Institute of Technology
Wanxiang Che Harbin Institute of Technology
Dechen Zhan Harbin Institute of Technology
Jian-Guang Lou Microsoft Research Asia

DOI:

https://doi.org/10.1609/aaai.v37i11.26499

Keywords:

SNLP: Lexical & Frame Semantics, Semantic Parsing, SNLP: Question Answering, SNLP: Sentence-Level Semantics and Textual Inference, SNLP: Syntax -- Tagging, Chunking & Parsing

Abstract

Text-to-SQL semantic parsing is an important NLP task, which facilitates the interaction between users and the database. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MultiSpider, the largest multilingual text-to-SQL semantic parsing dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MultiSpider we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under various settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVe (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Subscription