SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation

Authors

  • Zeyu Yang The Chinese University of Hong Kong, Shenzhen Shenzhen Loop Area Institute
  • Lai Wei The Chinese University of Hong Kong, Shenzhen
  • Roman Koshkin Okinawa Institute of Science and Technology (OIST)
  • Xi Chen The Chinese University of Hong Kong, Shenzhen
  • Satoshi Nakamura The Chinese University of Hong Kong, Shenzhen Shenzhen Loop Area Institute Nara Institute of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i40.40733

Abstract

This work proposes a grammar-based chunking strategy that segments input streams into semantically complete units by parsing dependency relations (e.g., noun phrase boundaries, verb-object structures) and punctuation features. The method ensures chunk coherence and minimizes semantic fragmentation. Building on this mechanism, we present SASST (Syntax-Aware Simultaneous Translation), an end-to-end framework integrating frozen Whisper encoder and decoder-only LLM. The unified architecture dynamically outputs translation tokens or symbols to jointly optimize translation timing and content, with target-side reordering addressing word-order divergence. Experiments on CoVoST2 multilingual corpus (En to De/Zh/Ja) demonstrate significant translation quality improvements across languages, validating the effectiveness of syntactic structures in LLM-driven SimulST systems.

Published

2026-03-14

How to Cite

Yang, Z., Wei, L., Koshkin, R., Chen, X., & Nakamura, S. (2026). SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 40(40), 34358–34367. https://doi.org/10.1609/aaai.v40i40.40733

Issue

Section

AAAI Technical Track on Natural Language Processing V