Using Crowdsourcing to Generate an Evaluation Dataset for Name Matching Technologies

Authors

  • Alya Asarina MIT Lincoln Laboratory
  • Olga Simek MIT Lincoln Laboratory

DOI:

https://doi.org/10.1609/hcomp.v1i1.13122

Keywords:

crowdsourcing, work quality, ranking, name matching

Abstract

Crowdsourcing can be a fast, flexible and cost-effective approach to obtaining data for training and evaluating machine learning algorithms. In this paper, we discuss a novel crowdsourcing application: creating a dataset for evaluating name matchers. Name matching is the challenging and subjective task of identifying which names refer to the same person; it is crucial for effective entity disambiguation and search. We have developed an effective question interface and work quality analysis algorithm for our task, which can be applied to other ranking tasks (e.g. search result ranking, recommendation system evaluation, etc.). We have demonstrated that our crowdsourced dataset can successfully be used to evaluate automatic name-matching algorithms.

Downloads

Published

2013-11-03

How to Cite

Asarina, A., & Simek, O. (2013). Using Crowdsourcing to Generate an Evaluation Dataset for Name Matching Technologies. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 1(1), 6-7. https://doi.org/10.1609/hcomp.v1i1.13122