A Generative Model of Words and Relationships from Multiple Sources

Authors

  • Stephanie Hyland Weill Cornell Graduate School of Medical Sciences/Memorial Sloan Kettering Cancer Center
  • Theofanis Karaletsos Memorial Sloan Kettering Cancer Center
  • Gunnar Rätsch Memorial Sloan Kettering Cancer Center

DOI:

https://doi.org/10.1609/aaai.v30i1.10335

Keywords:

word embeddings, generative model, natural language processing, relational data

Abstract

Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this requirement may not be met due to difficulties in obtaining a large corpus, or the limited range of expression in average use. Such domains may encode prior knowledge about entities in a knowledge base or ontology. We propose a generative model which integrates evidence from diverse data sources, enabling the sharing of semantic information. We achieve this by generalising the concept of co-occurrence from distributional semantics to include other relationships between entities or words, which we model as affine transformations on the embedding space. We demonstrate the effectiveness of this approach by outperforming recent models on a link prediction task and demonstrating its ability to profit from partially or fully unobserved data training labels. We further demonstrate the usefulness of learning from different data sources with overlapping vocabularies.

Downloads

Published

2016-03-05

How to Cite

Hyland, S., Karaletsos, T., & Rätsch, G. (2016). A Generative Model of Words and Relationships from Multiple Sources. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). https://doi.org/10.1609/aaai.v30i1.10335

Issue

Section

Technical Papers: NLP and Knowledge Representation