DONG, Qianqian; YE, Rong; WANG, Mingxuan; ZHOU, Hao; XU, Shuang; XU, Bo; LI, Lei. Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation. Proceedings of the AAAI Conference on Artificial Intelligence, [S. l.], v. 35, n. 14, p. 12749–12759, 2021. DOI: 10.1609/aaai.v35i14.17509. Disponível em: https://ojs.aaai.org/index.php/AAAI/article/view/17509. Acesso em: 25 may. 2026.