Using Semantics and Statistics to Turn Data into Knowledge


  • Jay Pujara University of Maryland, College Park
  • Hui Miao
  • Lise Getoor University of California, Santa Cruz
  • William W. Cohen Carnegie Mellon University



Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. A key problem in constructing these knowledge bases from sources like the web is overcoming the erroneous and incomplete information found in millions of candidate extractions. To solve this problem, we turn to semantics — using ontological constraints between candidate facts to eliminate errors. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.




How to Cite

Pujara, J., Miao, H., Getoor, L., & Cohen, W. W. (2015). Using Semantics and Statistics to Turn Data into Knowledge. AI Magazine, 36(1), 65-74.