Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction

Liang Zhang; Jinsong Su; Zijun Min; Zhongjian Miao; Qingguo Hu; Biao Fu; Xiaodong Shi; Yidong Chen

doi:10.1609/aaai.v37i11.26635

Authors

Liang Zhang Xiamen University
Jinsong Su Xiamen University
Zijun Min Xiamen University
Zhongjian Miao Xiamen University
Qingguo Hu Xiamen University
Biao Fu Xiamen University
Xiaodong Shi Xiamen university
Yidong Chen Xiamen University

DOI:

https://doi.org/10.1609/aaai.v37i11.26635

Keywords:

SNLP: Information Extraction, SNLP: Text Mining

Abstract

Document-level relation extraction (RE) aims to extract relational triples from a document. One of its primary challenges is to predict implicit relations between entities, which are not explicitly expressed in the document but can usually be extracted through relational reasoning. Previous methods mainly implicitly model relational reasoning through the interaction among entities or entity pairs. However, they suffer from two deficiencies: 1) they often consider only one reasoning pattern, of which coverage on relational triples is limited; 2) they do not explicitly model the process of relational reasoning. In this paper, to deal with the first problem, we propose a document-level RE model with a reasoning module that contains a core unit, the reasoning multi-head self-attention unit. This unit is a variant of the conventional multi-head self-attention and utilizes four attention heads to model four common reasoning patterns, respectively, which can cover more relational triples than previous methods. Then, to address the second issue, we propose a self-distillation training framework, which contains two branches sharing parameters. In the first branch, we first randomly mask some entity pair feature vectors in the document, and then train our reasoning module to infer their relations by exploiting the feature information of other related entity pairs. By doing so, we can explicitly model the process of relational reasoning. However, because the additional masking operation is not used during testing, it causes an input gap between training and testing scenarios, which would hurt the model performance. To reduce this gap, we perform conventional supervised training without masking operation in the second branch and utilize Kullback-Leibler divergence loss to minimize the difference between the predictions of the two branches. Finally, we conduct comprehensive experiments on three benchmark datasets, of which experimental results demonstrate that our model consistently outperforms all competitive baselines. Our source code is available at https://github.com/DeepLearnXMU/DocRE-SD

Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription