CrossNER: Evaluating Cross-Domain Named Entity Recognition

Authors

  • Zihan Liu The Hong Kong University of Science and Technology
  • Yan Xu Hong Kong University of Science and Technology
  • Tiezheng Yu The Hong Kong University of Science and Technology
  • Wenliang Dai The Hong Kong University of Science and Technology
  • Ziwei Ji The Hong Kong University of Science and Technology
  • Samuel Cahyawijaya The Hong Kong University of Science and Technology
  • Andrea Madotto The Hong Kong University of Science and Technology
  • Pascale Fung The Hong Kong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v35i15.17587

Keywords:

Syntax -- Tagging, Chunking & Parsing

Abstract

Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the cross-domain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https://github.com/zliucr/CrossNER.

Downloads

Published

2021-05-18

How to Cite

Liu, Z., Xu, Y., Yu, T., Dai, W., Ji, Z., Cahyawijaya, S., Madotto, A., & Fung, P. (2021). CrossNER: Evaluating Cross-Domain Named Entity Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 35(15), 13452-13460. https://doi.org/10.1609/aaai.v35i15.17587

Issue

Section

AAAI Technical Track on Speech and Natural Language Processing II