Dong Q, Ye R, Wang M, Zhou H, Xu S, Xu B, Li L. Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation. AAAI [Internet]. 2021May18 [cited 2024Apr.26];35(14):12749-5. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/17509