[1]
Dong, Q., Ye, R., Wang, M., Zhou, H., Xu, S., Xu, B. and Li, L. 2021. Listen, Understand and Translate: Triple Supervision Decouples End-to-end Speech-to-text Translation. Proceedings of the AAAI Conference on Artificial Intelligence. 35, 14 (May 2021), 12749-12759.