Spectral Word Embedding with Negative Sampling
Keywords:Word Embedding, Natural Language Processing, Unsupervised Learning, Matrix Factorization, Spectral Algorithms, Singular Value Decomposition
In this work, we investigate word embedding algorithms in the context of natural language processing. In particular, we examine the notion of ``negative examples'', the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. In fact, our algorithm not only learns from the important word-context co-occurrences, but also it learns from the abundance of unobserved or insignificant co-occurrences to improve the distribution of words in the latent embedded space. We analyze the algorithm theoretically and provide an optimal solution for the problem using spectral analysis. We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that negative sampling can boost the quality of spectral methods. Our algorithm provides results as good as the state-of-the-art but in a much faster and efficient way.