Dong, Q., R. Ye, M. Wang, H. Zhou, S. Xu, B. Xu, and L. Li. “Listen, Understand and Translate: Triple Supervision Decouples End-to-End Speech-to-Text Translation”. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, May 2021, pp. 12749-5, https://ojs.aaai.org/index.php/AAAI/article/view/17509.