Alternating Language Modeling for Cross-Lingual Pre-Training

Jian Yang; Shuming Ma; Dongdong Zhang; ShuangZhi Wu; Zhoujun Li; Ming Zhou

doi:10.1609/aaai.v34i05.6480

Authors

Jian Yang BeiHang University
Shuming Ma Microsoft Research Asia
Dongdong Zhang Microsoft Research Asia
ShuangZhi Wu SPPD of Tencent Inc.
Zhoujun Li Beihang University
Ming Zhou Microsoft Research Asia

DOI:

https://doi.org/10.1609/aaai.v34i05.6480

Abstract

Language model pre-training has achieved success in many natural language processing tasks. Existing methods for cross-lingual pre-training adopt Translation Language Model to predict masked words with the concatenation of the source sentence and its target equivalent. In this work, we introduce a novel cross-lingual pre-training method, called Alternating Language Modeling (ALM). It code-switches sentences of different languages rather than simple concatenation, hoping to capture the rich cross-lingual context of words and phrases. More specifically, we randomly substitute source phrases with target translations to create code-switched sentences. Then, we use these code-switched data to train ALM model to learn to predict words of different languages. We evaluate our pre-training ALM on the downstream tasks of machine translation and cross-lingual classification. Experiments show that ALM can outperform the previous pre-training methods on three benchmarks.¹

Alternating Language Modeling for Cross-Lingual Pre-Training

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information