Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement

Authors

  • Fuwei Zhang Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100049, China
  • Zhao Zhang Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
  • Xiang Ao Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100049, China Institute of Intelligent Computing Technology, Suzhou, CAS
  • Dehong Gao Alibaba Group, Hangzhou, China
  • Fuzhen Zhuang Institute of Artificial Intelligence, Beihang University, Beijing 100191, China SKLSDE, School of Computer Science, Beihang University, Beijing 100191, China
  • Yi Wei Alibaba Group, Hangzhou, China
  • Qing He Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100049, China

DOI:

https://doi.org/10.1609/aaai.v36i4.20355

Keywords:

Data Mining & Knowledge Management (DMKM)

Abstract

Cross-Lingual Information Retrieval (CLIR) aims to rank the documents written in a language different from the user’s query. The intrinsic gap between different languages is an essential challenge for CLIR. In this paper, we introduce the multilingual knowledge graph (KG) to the CLIR task due to the sufficient information of entities in multiple languages. It is regarded as a “silver bullet” to simultaneously perform explicit alignment between queries and documents and also broaden the representations of queries. And we propose a model named CLIR with HIerarchical Knowledge Enhancement (HIKE) for our task. The proposed model encodes the textual information in queries, documents and the KG with multilingual BERT, and incorporates the KG information in the query-document matching process with a hierarchical information fusion mechanism. Particularly, HIKE first integrates the entities and their neighborhood in KG into query representations with a knowledge-level fusion, then combines the knowledge from both source and target languages to further mitigate the linguistic gap with a language-level fusion. Finally, experimental results demonstrate that HIKE achieves substantial improvements over state-of-the-art competitors.

Downloads

Published

2022-06-28

How to Cite

Zhang, F., Zhang, Z., Ao, X., Gao, D., Zhuang, F., Wei, Y., & He, Q. (2022). Mind the Gap: Cross-Lingual Information Retrieval with Hierarchical Knowledge Enhancement. Proceedings of the AAAI Conference on Artificial Intelligence, 36(4), 4345-4353. https://doi.org/10.1609/aaai.v36i4.20355

Issue

Section

AAAI Technical Track on Data Mining and Knowledge Management