Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

Chengyi Wang; Yu Wu; Shujie Liu; Zhenglu Yang; Ming Zhou

doi:10.1609/aaai.v34i05.6452

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

Authors

Chengyi Wang Nankai University
Yu Wu Microsoft Research Asia
Shujie Liu Microsoft Research Asia
Zhenglu Yang Nankai University
Ming Zhou Microsoft Research Asia

DOI:

https://doi.org/10.1609/aaai.v34i05.6452

Abstract

End-to-end speech translation, a hot topic in recent years, aims to translate a segment of audio into a specific language with an end-to-end model. Conventional approaches employ multi-task learning and pre-training methods for this task, but they suffer from the huge gap between pre-training and fine-tuning. To address these issues, we propose a Tandem Connectionist Encoding Network (TCEN) which bridges the gap by reusing all subnets in fine-tuning, keeping the roles of subnets consistent, and pre-training the attention module. Furthermore, we propose two simple but effective methods to guarantee the speech encoder outputs and the MT encoder inputs are consistent in terms of semantic representation and sequence length. Experimental results show that our model leads to significant improvements in En-De and En-Fr translation irrespective of the backbones.

Downloads

Published

2020-04-03

How to Cite

Wang, C., Wu, Y., Liu, S., Yang, Z., & Zhou, M. (2020). Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9161–9168. https://doi.org/10.1609/aaai.v34i05.6452

Download Citation

Issue

Vol. 34 No. 05: AAAI-20 Technical Tracks 5

Section

AAAI Technical Track: Natural Language Processing

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information