Using Crowdsourcing to Generate an Evaluation Dataset for Name Matching Technologies

Alya Asarina; Olga Simek

doi:10.1609/hcomp.v1i1.13122

Using Crowdsourcing to Generate an Evaluation Dataset for Name Matching Technologies

Authors

Alya Asarina MIT Lincoln Laboratory
Olga Simek MIT Lincoln Laboratory

DOI:

https://doi.org/10.1609/hcomp.v1i1.13122

Keywords:

crowdsourcing, work quality, ranking, name matching

Abstract

Crowdsourcing can be a fast, flexible and cost-effective approach to obtaining data for training and evaluating machine learning algorithms. In this paper, we discuss a novel crowdsourcing application: creating a dataset for evaluating name matchers. Name matching is the challenging and subjective task of identifying which names refer to the same person; it is crucial for effective entity disambiguation and search. We have developed an effective question interface and work quality analysis algorithm for our task, which can be applied to other ranking tasks (e.g. search result ranking, recommendation system evaluation, etc.). We have demonstrated that our crowdsourced dataset can successfully be used to evaluate automatic name-matching algorithms.

Downloads

Published

2013-11-03

How to Cite

Asarina, A., & Simek, O. (2013). Using Crowdsourcing to Generate an Evaluation Dataset for Name Matching Technologies. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 1(1), 6–7. https://doi.org/10.1609/hcomp.v1i1.13122

Download Citation

Issue

Vol. 1 (2013): First AAAI Conference on Human Computation and Crowdsourcing

Section

Works in Progress

Using Crowdsourcing to Generate an Evaluation Dataset for Name Matching Technologies

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information