Discrepancy and Uncertainty Aware Denoising Knowledge Distillation for Zero-Shot Cross-Lingual Named Entity Recognition

Authors

  • Ling Ge School of Computer Science and Engineering, Beihang University, Beijing, China
  • Chunming Hu School of Computer Science and Engineering, Beihang University, Beijing, China College of Software, Beihang University, Beijing, China Zhongguancun Laboratory, Beijing, China
  • Guanghui Ma School of Computer Science and Engineering, Beihang University, Beijing, China
  • Jihong Liu School of Mechanical Engineering and Automation, Beihang University, Beijing, China
  • Hong Zhang National Computer Network Emergency Response Technical Team / Coordination Center of China, Beijing, China

DOI:

https://doi.org/10.1609/aaai.v38i16.29762

Keywords:

NLP: Applications, NLP: Information Extraction

Abstract

The knowledge distillation-based approaches have recently yielded state-of-the-art (SOTA) results for cross-lingual NER tasks in zero-shot scenarios. These approaches typically employ a teacher network trained with the labelled source (rich-resource) language to infer pseudo-soft labels for the unlabelled target (zero-shot) language, and force a student network to approximate these pseudo labels to achieve knowledge transfer. However, previous works have rarely discussed the issue of pseudo-label noise caused by the source-target language gap, which can mislead the training of the student network and result in negative knowledge transfer. This paper proposes an discrepancy and uncertainty aware Denoising Knowledge Distillation model (DenKD) to tackle this issue. Specifically, DenKD uses a discrepancy-aware denoising representation learning method to optimize the class representations of the target language produced by the teacher network, thus enhancing the quality of pseudo labels and reducing noisy predictions. Further, DenKD employs an uncertainty-aware denoising method to quantify the pseudo-label noise and adjust the focus of the student network on different samples during knowledge distillation, thereby mitigating the noise's adverse effects. We conduct extensive experiments on 28 languages including 4 languages not covered by the pre-trained models, and the results demonstrate the effectiveness of our DenKD.

Published

2024-03-24

How to Cite

Ge, L., Hu, C., Ma, G., Liu, J., & Zhang, H. (2024). Discrepancy and Uncertainty Aware Denoising Knowledge Distillation for Zero-Shot Cross-Lingual Named Entity Recognition. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 18056-18064. https://doi.org/10.1609/aaai.v38i16.29762

Issue

Section

AAAI Technical Track on Natural Language Processing I