A Unified Pretraining Framework for Passage Ranking and Expansion

Ming Yan; Chenliang Li; Bin Bi; Wei Wang; Songfang Huang

doi:10.1609/aaai.v35i5.16584

Authors

Ming Yan Alibaba Group
Chenliang Li Alibaba Group
Bin Bi Alibaba Group
Wei Wang Alibaba Group
Songfang Huang Alibaba Group

DOI:

https://doi.org/10.1609/aaai.v35i5.16584

Keywords:

Web Search & Information Retrieval

Abstract

Pretrained language models have recently advanced a wide range of natural language processing tasks. Nowadays, the application of pretrained language models to IR tasks has also achieved impressive results. Typical methods either directly apply a pretrained model to improve the re-ranking stage, or use it to conduct passage expansion and term weighting for first-stage retrieval. We observe that the passage ranking and passage expansion tasks share certain inherent relations, and can benefit from each other. Therefore, in this paper, we propose a general pretraining framework to enhance both tasks with Unified Encoder-Decoder networks (UED). The overall ranking framework consists of two parts in a cascade manner: (1) passage expansion with a pretraining-based query generation method; (2) re-ranking of passage candidates from a traditional retrieval method with a pretrained transformer encoder. Both the two parts are based on the same pretrained UED model, where we jointly train the passage ranking and query generation tasks for further improving the full ranking pipeline. An extensive set of experiments have been conducted on two large-scale passage retrieval datasets to demonstrate the state-of-the-art results of the proposed framework in both the first-stage retrieval and the final re-ranking. In addition, we successfully deploy the framework to our online production system, which can stably serve industrial applications with a request volume of up to 100 QPS in less than 300ms.

A Unified Pretraining Framework for Passage Ranking and Expansion

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Information

Developed By

Subscription