A Unified Pretraining Framework for Passage Ranking and Expansion
Keywords:Web Search & Information Retrieval
AbstractPretrained language models have recently advanced a wide range of natural language processing tasks. Nowadays, the application of pretrained language models to IR tasks has also achieved impressive results. Typical methods either directly apply a pretrained model to improve the re-ranking stage, or use it to conduct passage expansion and term weighting for first-stage retrieval. We observe that the passage ranking and passage expansion tasks share certain inherent relations, and can benefit from each other. Therefore, in this paper, we propose a general pretraining framework to enhance both tasks with Unified Encoder-Decoder networks (UED). The overall ranking framework consists of two parts in a cascade manner: (1) passage expansion with a pretraining-based query generation method; (2) re-ranking of passage candidates from a traditional retrieval method with a pretrained transformer encoder. Both the two parts are based on the same pretrained UED model, where we jointly train the passage ranking and query generation tasks for further improving the full ranking pipeline. An extensive set of experiments have been conducted on two large-scale passage retrieval datasets to demonstrate the state-of-the-art results of the proposed framework in both the first-stage retrieval and the final re-ranking. In addition, we successfully deploy the framework to our online production system, which can stably serve industrial applications with a request volume of up to 100 QPS in less than 300ms.
How to Cite
Yan, M., Li, C., Bi, B., Wang, W., & Huang, S. (2021). A Unified Pretraining Framework for Passage Ranking and Expansion. Proceedings of the AAAI Conference on Artificial Intelligence, 35(5), 4555-4563. https://doi.org/10.1609/aaai.v35i5.16584
AAAI Technical Track on Data Mining and Knowledge Management