Dong, Qianqian, Rong Ye, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, and Lei Li. “Listen, Understand and Translate: Triple Supervision Decouples End-to-End Speech-to-Text Translation”. Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (May 18, 2021): 12749-12759. Accessed May 14, 2024. https://ojs.aaai.org/index.php/AAAI/article/view/17509.