Low Resource Sequence Tagging with Weak Labels

Edwin Simpson; Jonas Pfeiffer; Iryna Gurevych

doi:10.1609/aaai.v34i05.6415

Authors

Edwin Simpson Technische Universität Darmstadt
Jonas Pfeiffer Technische Universität Darmstadt
Iryna Gurevych Technische Universität Darmstadt

DOI:

https://doi.org/10.1609/aaai.v34i05.6415

Abstract

Current methods for sequence tagging depend on large quantities of domain-specific training data, limiting their use in new, user-defined tasks with few or no annotations. While crowdsourcing can be a cheap source of labels, it often introduces errors that degrade the performance of models trained on such crowdsourced data. Another solution is to use transfer learning to tackle low resource sequence labelling, but current approaches rely heavily on similar high resource datasets in different languages. In this paper, we propose a domain adaptation method using Bayesian sequence combination to exploit pre-trained models and unreliable crowdsourced data that does not require high resource data in a different language. Our method boosts performance by learning the relationship between each labeller and the target task and trains a sequence labeller on the target domain with little or no gold-standard data. We apply our approach to labelling diagnostic classes in medical and educational case studies, showing that the model achieves strong performance though zero-shot transfer learning and is more effective than alternative ensemble methods. Using NER and information extraction tasks, we show how our approach can train a model directly from crowdsourced labels, outperforming pipeline approaches that first aggregate the crowdsourced data, then train on the aggregated labels.

Low Resource Sequence Tagging with Weak Labels

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription