State of the Union: A Data Consumer's Perspective on Wikidata and Its Properties for the Classification and Resolution of Entities

Authors

  • Andreas Spitz Heidelberg University
  • Vaibhav Dixit Heidelberg University
  • Ludwig Richter Heidelberg University
  • Michael Gertz Heidelberg University
  • Johanna Geiss Heidelberg University

DOI:

https://doi.org/10.1609/icwsm.v10i2.14832

Abstract

Wikipedia is one of the most popular sources of free data on the Internet and subject to extensive use in numerous areas of research. Wikidata on the other hand, the knowledge base behind Wikipedia, is less popular as a source of data, despite having the "data" already in its name, and despite the fact that many applications in Natural Language Processing in general and Information Extraction in particular benefit immensely from the integration of knowledge bases. In part, this imbalance is owed to the younger age of Wikidata, which launched over a decade after Wikipedia. However, this is also owed to challenges posed by the still evolving properties of Wikidata that make its content more difficult to consume for third parties than is desirable. In this article, we analzye the causes of these challenges from the viewpoint of a data consumer and discuss possible avenues of research and advancement that both the scientific and the Wikidata community can collaborate on to turn the knowledge base into the invaluable asset that it is uniquely positioned to become.

Downloads

Published

2021-08-04

How to Cite

Spitz, A., Dixit, V., Richter, L., Gertz, M., & Geiss, J. (2021). State of the Union: A Data Consumer’s Perspective on Wikidata and Its Properties for the Classification and Resolution of Entities. Proceedings of the International AAAI Conference on Web and Social Media, 10(2), 88-95. https://doi.org/10.1609/icwsm.v10i2.14832