Machine-Created Universal Language for Cross-Lingual Transfer

Authors

  • Yaobo Liang Microsoft Research Asia
  • Quanzhi Zhu Microsoft Research Asia
  • Junhe Zhao Microsoft Research Asia
  • Nan Duan Microsoft Research

DOI:

https://doi.org/10.1609/aaai.v38i17.29824

Keywords:

NLP: Machine Translation, Multilinguality, Cross-Lingual NLP

Abstract

There are two primary approaches to addressing cross-lingual transfer: multilingual pre-training, which implicitly aligns the hidden representations of various languages, and translate-test, which explicitly translates different languages into an intermediate language, such as English. Translate-test offers better interpretability compared to multilingual pre-training. However, it has lower performance than multilingual pre-training and struggles with word-level tasks due to translation altering word order. As a result, we propose a new Machine-created Universal Language (MUL) as an alternative intermediate language. MUL comprises a set of discrete symbols forming a universal vocabulary and a natural language to MUL translator for converting multiple natural languages to MUL. MUL unifies shared concepts from various languages into a single universal word, enhancing cross-language transfer. Additionally, MUL retains language-specific words and word order, allowing the model to be easily applied to word-level tasks. Our experiments demonstrate that translating into MUL yields improved performance compared to multilingual pre-training, and our analysis indicates that MUL possesses strong interpretability. The code is at: https://github.com/microsoft/Unicoder/tree/master/MCUL.

Published

2024-03-24

How to Cite

Liang, Y., Zhu, Q., Zhao, J., & Duan, N. (2024). Machine-Created Universal Language for Cross-Lingual Transfer. Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 18617-18625. https://doi.org/10.1609/aaai.v38i17.29824

Issue

Section

AAAI Technical Track on Natural Language Processing II