CrossNER: Evaluating Cross-Domain Named Entity Recognition

Zihan Liu; Yan Xu; Tiezheng Yu; Wenliang Dai; Ziwei Ji; Samuel Cahyawijaya; Andrea Madotto; Pascale Fung

doi:10.1609/aaai.v35i15.17587

Authors

Zihan Liu The Hong Kong University of Science and Technology
Yan Xu Hong Kong University of Science and Technology
Tiezheng Yu The Hong Kong University of Science and Technology
Wenliang Dai The Hong Kong University of Science and Technology
Ziwei Ji The Hong Kong University of Science and Technology
Samuel Cahyawijaya The Hong Kong University of Science and Technology
Andrea Madotto The Hong Kong University of Science and Technology
Pascale Fung The Hong Kong University of Science and Technology

DOI:

https://doi.org/10.1609/aaai.v35i15.17587

Keywords:

Syntax -- Tagging, Chunking & Parsing

Abstract

Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the cross-domain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https://github.com/zliucr/CrossNER.

CrossNER: Evaluating Cross-Domain Named Entity Recognition

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription