SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation
DOI:
https://doi.org/10.1609/aaai.v40i40.40733Abstract
This work proposes a grammar-based chunking strategy that segments input streams into semantically complete units by parsing dependency relations (e.g., noun phrase boundaries, verb-object structures) and punctuation features. The method ensures chunk coherence and minimizes semantic fragmentation. Building on this mechanism, we present SASST (Syntax-Aware Simultaneous Translation), an end-to-end framework integrating frozen Whisper encoder and decoder-only LLM. The unified architecture dynamically outputs translation tokens or symbols to jointly optimize translation timing and content, with target-side reordering addressing word-order divergence. Experiments on CoVoST2 multilingual corpus (En to De/Zh/Ja) demonstrate significant translation quality improvements across languages, validating the effectiveness of syntactic structures in LLM-driven SimulST systems.Downloads
Published
2026-03-14
How to Cite
Yang, Z., Wei, L., Koshkin, R., Chen, X., & Nakamura, S. (2026). SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 34358–34367. https://doi.org/10.1609/aaai.v40i40.40733
Issue
Section
AAAI Technical Track on Natural Language Processing V