ConTextual Masked Auto-Encoder for Dense Passage Retrieval

Xing Wu; Guangyuan Ma; Meng Lin; Zijia Lin; Zhongyuan Wang; Songlin Hu

doi:10.1609/aaai.v37i4.25598

Authors

Xing Wu Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences Kuaishou Technology
Guangyuan Ma Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
Meng Lin Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences
Zijia Lin Kuaishou Technology
Zhongyuan Wang Kuaishou Technology
Songlin Hu Institute of Information Engineering, Chinese Academy of Sciences School of Cyber Security, University of Chinese Academy of Sciences

DOI:

https://doi.org/10.1609/aaai.v37i4.25598

Keywords:

DMKM: Web Search & Information Retrieval, SNLP: Language Models

Abstract

Dense passage retrieval aims to retrieve the relevant passages of a query from a large corpus based on dense representations (i.e., vectors) of the query and the passages. Recent studies have explored improving pre-trained language models to boost dense retrieval performance. This paper proposes CoT-MAE (ConTextual Masked Auto-Encoder), a simple yet effective generative pre-training method for dense passage retrieval. CoT-MAE employs an asymmetric encoder-decoder architecture that learns to compress the sentence semantics into a dense vector through self-supervised and context-supervised masked auto-encoding. Precisely, self-supervised masked auto-encoding learns to model the semantics of the tokens inside a text span, and context-supervised masked auto-encoding learns to model the semantical correlation between the text spans. We conduct experiments on large-scale passage retrieval benchmarks and show considerable improvements over strong baselines, demonstrating the high efficiency of CoT-MAE. Our code is available at https://github.com/caskcsg/ir/tree/main/cotmae.

ConTextual Masked Auto-Encoder for Dense Passage Retrieval

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription