TY - JOUR AU - Haji Soleimani, Behrouz AU - Matwin, Stan PY - 2018/04/27 Y2 - 2024/03/29 TI - Spectral Word Embedding with Negative Sampling JF - Proceedings of the AAAI Conference on Artificial Intelligence JA - AAAI VL - 32 IS - 1 SE - Main Track: NLP and Machine Learning DO - 10.1609/aaai.v32i1.12015 UR - https://ojs.aaai.org/index.php/AAAI/article/view/12015 SP - AB - <p> In this work, we investigate word embedding algorithms in the context of natural language processing. In particular, we examine the notion of ``negative examples'', the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. In fact, our algorithm not only learns from the important word-context co-occurrences, but also it learns from the abundance of unobserved or insignificant co-occurrences to improve the distribution of words in the latent embedded space. We analyze the algorithm theoretically and provide an optimal solution for the problem using spectral analysis. We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that negative sampling can boost the quality of spectral methods. Our algorithm provides results as good as the state-of-the-art but in a much faster and efficient way. </p> ER -