(1)

Dong, Q.; Ye, R.; Wang, M.; Zhou, H.; Xu, S.; Xu, B.; Li, L. Listen, Understand and Translate: Triple Supervision Decouples End-to-End Speech-to-Text Translation. AAAI 2021, 35, 12749-12759.