Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network

Yong Luo; Jian Tang; Jun Yan; Chao Xu; Zheng Chen

doi:10.1609/aaai.v28i1.8956

Authors

Yong Luo Peking University
Jian Tang Peking University
Jun Yan Microsoft Research Asia
Chao Xu Peking University
Zheng Chen Microsoft Research Asia

DOI:

https://doi.org/10.1609/aaai.v28i1.8956

Keywords:

pre-train, word embedding, multiple sources, neural network

Abstract

Word embedding aims to learn a continuous representation for each word. It attracts increasing attention due to its effectiveness in various tasks such as named entity recognition and language modeling. Most existing word embedding results are generally trained on one individual data source such as news pages or Wikipedia articles. However, when we apply them to other tasks such as web search, the performance suffers. To obtain a robust word embedding for different applications, multiple data sources could be leveraged. In this paper, we proposed a two-side multimodal neural network to learn a robust word embedding from multiple data sources including free text, user search queries and search click-through data. This framework takes the word embeddings learned from different data sources as pre-train, and then uses a two-side neural network to unify these embeddings. The pre-trained embeddings are obtained by adapting the recently proposed CBOW algorithm. Since the proposed neural network does not need to re-train word embeddings for a new task, it is highly scalable in real world problem solving. Besides, the network allows weighting different sources differently when applied to different application tasks. Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources.

Pre-Trained Multi-View Word Embedding Using Two-Side Neural Network

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription