Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction


  • Liang Zhang Xiamen University
  • Jinsong Su Xiamen University
  • Zijun Min Xiamen University
  • Zhongjian Miao Xiamen University
  • Qingguo Hu Xiamen University
  • Biao Fu Xiamen University
  • Xiaodong Shi Xiamen university
  • Yidong Chen Xiamen University




SNLP: Information Extraction, SNLP: Text Mining


Document-level relation extraction (RE) aims to extract relational triples from a document. One of its primary challenges is to predict implicit relations between entities, which are not explicitly expressed in the document but can usually be extracted through relational reasoning. Previous methods mainly implicitly model relational reasoning through the interaction among entities or entity pairs. However, they suffer from two deficiencies: 1) they often consider only one reasoning pattern, of which coverage on relational triples is limited; 2) they do not explicitly model the process of relational reasoning. In this paper, to deal with the first problem, we propose a document-level RE model with a reasoning module that contains a core unit, the reasoning multi-head self-attention unit. This unit is a variant of the conventional multi-head self-attention and utilizes four attention heads to model four common reasoning patterns, respectively, which can cover more relational triples than previous methods. Then, to address the second issue, we propose a self-distillation training framework, which contains two branches sharing parameters. In the first branch, we first randomly mask some entity pair feature vectors in the document, and then train our reasoning module to infer their relations by exploiting the feature information of other related entity pairs. By doing so, we can explicitly model the process of relational reasoning. However, because the additional masking operation is not used during testing, it causes an input gap between training and testing scenarios, which would hurt the model performance. To reduce this gap, we perform conventional supervised training without masking operation in the second branch and utilize Kullback-Leibler divergence loss to minimize the difference between the predictions of the two branches. Finally, we conduct comprehensive experiments on three benchmark datasets, of which experimental results demonstrate that our model consistently outperforms all competitive baselines. Our source code is available at https://github.com/DeepLearnXMU/DocRE-SD




How to Cite

Zhang, L., Su, J., Min, Z., Miao, Z., Hu, Q., Fu, B., Shi, X., & Chen, Y. (2023). Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction. Proceedings of the AAAI Conference on Artificial Intelligence, 37(11), 13967-13975. https://doi.org/10.1609/aaai.v37i11.26635



AAAI Technical Track on Speech & Natural Language Processing