SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation

Zeyu Yang; Lai Wei; Roman Koshkin; Xi Chen; Satoshi Nakamura

doi:10.1609/aaai.v40i40.40733

Authors

Zeyu Yang The Chinese University of Hong Kong, Shenzhen Shenzhen Loop Area Institute
Lai Wei The Chinese University of Hong Kong, Shenzhen
Roman Koshkin Okinawa Institute of Science and Technology (OIST)
Xi Chen The Chinese University of Hong Kong, Shenzhen
Satoshi Nakamura The Chinese University of Hong Kong, Shenzhen Shenzhen Loop Area Institute Nara Institute of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v40i40.40733

Abstract

This work proposes a grammar-based chunking strategy that segments input streams into semantically complete units by parsing dependency relations (e.g., noun phrase boundaries, verb-object structures) and punctuation features. The method ensures chunk coherence and minimizes semantic fragmentation. Building on this mechanism, we present SASST (Syntax-Aware Simultaneous Translation), an end-to-end framework integrating frozen Whisper encoder and decoder-only LLM. The unified architecture dynamically outputs translation tokens or symbols to jointly optimize translation timing and content, with target-side reordering addressing word-order divergence. Experiments on CoVoST2 multilingual corpus (En to De/Zh/Ja) demonstrate significant translation quality improvements across languages, validating the effectiveness of syntactic structures in LLM-driven SimulST systems.

SASST: Leveraging Syntax-Aware Chunking and LLMs for Simultaneous Speech Translation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information